Model Copy - Githubissues

lpalbou commented 5 years ago

This ticket replace and complete the following old tickets:

Current requirements:

[ ] the copied model should have the new date but all original facts should keep their original dates
[ ] the curator copying a model should only be cited on the new model if and when the person add or change curation data (see other ticket on curation vs meta-data actions)
[ ] the curator copying a model should only be cited on the new nodes / edges / evidences he added/modified, not those coming from the original model
[ ] the new model and nodes should refer with the owl:cloneFrom to the original nodes & models. It will be useful to keep track of the usage of a template and explore all its variations
[ ] possibly create a new color scheme to differentiate nodes/edges from the original model and the nodes/edges created on top of the copy
[ ] UI integration: add an optional "copy genes" parameter. If yes, it is a full copy, it not, it will also remove any evidence associated to those genes

Don't hesitate to add any requirements / comments / thoughts

kltm commented 5 years ago

@lpalbou How were these requirements generated? I think some of them are probably not needed/required. We might want to have a minimal ticket and then break others into additions; the previous final form was a bit lighter than this.

lpalbou commented 5 years ago

@kltm discussed with Kimberly, David and Pascale yesterday. The owl:cloneFrom was discussed with @cmungall about 2 weeks ago. The last two requirements are lesser priority. Anyhow, I am working on it. Only issue that I am having is barista/minerva automatically add both contributor and date, so I have to send additional bunch of requests to overwrite those annotations after individuals/facts have been created. It's not super nice but doing otherwise would require larger changes on how barista/minerva communicate.

kltm commented 5 years ago

Understood--I'm sure this will all work out in the wash and I'm really excited about getting copying out there!

Looking at this, I think that points 1 and 3 are the ones that have perked my ears; separate, but related to the same system. The new requirements seem to be a slightly different take on how "touching" would generally be apportioned and I think that attempting certain kinds of overrides from the client may be the wrong direction. To produce the effects given above, there may have to be a fair amount of removal of annotations that amounts to essentially a shadow model of credit. Or not, the devil is in the details. I think that is may be good to separate out the absolutely-has-to-happen-or-we-don't-want-copying bits from the this-would-be-great-to-haves; more granular tickets are easier to work with for these threads as well.

cmungall commented 5 years ago

instead of owl:cloneFrom use something like prov, there is a relationship there like derivedFrom (different from the RO derived_from which is material entities)

kltm commented 5 years ago

Talking to @lpalbou , the current PR is https://github.com/geneontology/noctua/pull/609 and https://github.com/berkeleybop/bbop-manager-minerva/pull/2

add a button to the noctua landing page button area to enable copying
copy actions are:
- "work saving" copy: new instances of everything, delta dates and contributors

kltm commented 5 years ago

@kltm will pull in bbop-manager-minerva changes (new functionality only), then get the new dev up.

lpalbou commented 5 years ago

Quick update: as I feared, that's indeed not the last version of the model copy on those two PRs. I stopped working on it 4 months ago while testing alternative solutions to speed up the process, so I have to find the last relevant / usable version. I will update this ticket when it's done.

kltm commented 5 years ago

@lpalbou NP. If you can't make the restart window today for Minerva, we'll take care of it later on--this is just a low-hanging-fruit addition for now.

lpalbou commented 5 years ago

I think I have located the last version, I am creating a new instance from scratch to check.

lpalbou commented 5 years ago

@kltm sorry for the extra precaution, but as it's been some time I wanted to make sure. Those PRs are actually pointing to the last usable versions of the model copy, so you can merge them both.

Notes:

I was experimenting in this april version with a way to send more requests in a single pass. My recollection is that it was less stable than the original version. Should there be any issue, I can roll back to the former version which consistently worked on hundreds of copy
When there is time, most of the copy process should be transferred to minerva as it will speed up the process and avoid possible packet loss. I think it's still useful to keep those methods in the bbop manager for a more granular way to copy models (e.g. one of the method can be used separately to copy a single individual / node)

kltm commented 5 years ago

Pending a little thought about https://github.com/geneontology/noctua/pull/609#issuecomment-510709716, I did not merge in yet (as it may not be intended to work as such), but switched to @lpalbou 's branch and rebuilt the package around the merged and updated bbop-manager-minerva. noctua-dev now has a lovely copy function to start thinking about iterating on.

vanaukenk commented 4 years ago

For discussion of provenance tracking: https://www.w3.org/TR/prov-o/#wasDerivedFrom

cmungall commented 3 years ago

PR for adding wasDerivedFrom to the shex schema https://github.com/geneontology/go-shapes/pull/266

lpalbou commented 3 years ago

For reference, method in bbop-manager: https://github.com/berkeleybop/bbop-manager-minerva/blob/a366911d7537eefc1815b1f8dacc7b023c8268f4/lib/manager.js#L1143. It was tested 2 years ago and is probably good enough for a V1. Fairly easy to integrate in any UI (see mockup). Only addition would be:

add the prov:wasDerivedFrom, to connect related models together and help filter the list of contributors to be shown in the UI.
add a template parameter, when true: do not copy gene products and references (eg used for a same model in different species)

For a V2, recommend either:

to decrease the number of client-server communication (speed up & avoid possible network issues). discussion with @kltm regarding how to reference IRI before their creation
transfer more logic to the server side with a new operation in the minerva API

vanaukenk commented 3 years ago

From 2021-08-31 call:

We've agreed to move forward with a 'shallow copy' functionality.

Shallow copy = ontology terms and relations are copied in full, but genes (and other enablers), evidence, and other metadata are not copied to the new model. Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.

A DerivedFrom tag will capture, at the model level, the originating model id.

The new model will only have the copier as contributor.

A default title, 'copy of x', where x is the original model title will be added.

All copied models will initially have state = development.

@lpalbou @kltm @ukemi @thomaspd @cmungall @balhoff @pgaudet @sierra-moxon

Please comment as needed.

pgaudet commented 3 years ago

Hi @vanaukenk

You write

Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.

Is that what we decided? Especially if we annotate a different species, this information could be misleading; I thought we would strip these as well.

Thanks, Pascale

vanaukenk commented 3 years ago

@pgaudet I'm thinking here of chemical inputs and outputs, e.g. for an enzyme activity.

pgaudet commented 3 years ago

Ah ok - sure

but not gene products for example for transcription factors, and probably not cell types and anatomy either?

lpalbou commented 3 years ago

Shallow copy = ontology terms and relations are copied in full, but genes (and other enablers), evidence, and other metadata are not copied to the new model. Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.

Apologies, but for me the simpler way to look at it is: copy everything but enablers and evidences. And this will be the way it's implemented (two simple tests on some specific relationships).

Inputs and Outputs

Need to be clarified from above. Do we discard as well everything from has input and has output ? @thomaspd

DerivedFrom tag

It was implemented both at the model level and at the individual level but it would be easy to modulate the default behavior. @kltm

code responsible for derivedFrom at model level
code responsible for derivedFrom at individual level

Note: creating those links only have pros IMO & will enable later the creation of functionalities (eg if the original activity changes, notify curators who copied the models or tag copied models; or link / group all models of a pathway in different species, etc)

The new model will only have the copier as contributor.

Tag @kltm @tmushayahama as this will be solved currently at the UI level, by subtracting the list of contributors from the original model and the copied model

All copied models will initially have state = development.

Not the current behavior, but hopefully will have time to fix that.

lpalbou commented 3 years ago

but not gene products for example for transcription factors, and probably not cell types and anatomy either?

Need to clarify exactly if and what we discard from has input / has output.

Note: I will take a look but while it's easy to discard completely has input and/or has output, I believe filtering them conditionally based on their value may be tricky with the current code and unrealistic before I leave.

cmungall commented 3 years ago

Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.

But we may be worrying too much about details. If in the first pass, curators need to explicitly delete some inputs or outputs I think it is OK, overall it is still a huge timesaver.

vanaukenk commented 3 years ago

Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.

That makes sense to me.

tmushayahama commented 3 years ago

What if we use step-by-step UI wizard type of copy with bunch of checkboxes, radio buttons and let the user choose. Power to UI :). But I agree with Chris, I think first pass should be a proof of concept as @lpalbou copy code strategy works fantastic on landing page.

@vanaukenk I think another approach is I can unhide the copy functionality on landing page for dev and actually discuss while looking at a current "prototype" (what LP had) than tweaking before checking whats on the table.

lpalbou commented 3 years ago

Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.

Ideally yes, but I believe it’s too much to ask for the bbop-manager-minerva lib, and if it was possible it would probably be too slow. I would put a conditional filtering on a value root type (eg filtering all macromolecules) as something to be done on a V2 made in minerva directly. Right now, what we could do (please advise), is to either keep or remove completely the entities of has input and/or has output.

vanaukenk commented 3 years ago

If one of our main uses cases for shallow copy is to copy models from one organism to another, then if we have to choose between keep or remove completely entities for v1, then I think we have to choose remove completely.

cmungall commented 3 years ago

sounds good!

On Tue, Aug 31, 2021 at 1:36 PM vanaukenk @.***> wrote:

If one of our main uses cases for shallow copy is to copy models from one organism to another, then if we have to choose between keep or remove completely entities for v1, then I think we have to choose remove completely.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/noctua/issues/606#issuecomment-909615759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOI6PDCMP55FUZPMBJDT7U4OVANCNFSM4G5WHD2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lpalbou commented 3 years ago

@cmungall just thinking out loud of a bad workaround to filter out macromolecules even in a V1.

I believe we could filter out all macromolecules from enablers/has input/has output.. by doing a dumb check on the CURIE prefixes. Anything starting with HGNC, UniProtKB, FB, WB & MODs etc would be macromolecules (do we use MOD ids for small molecules ? I think / hope not.. or possibly they would have different ID format ?), and small molecules probably only comes from 1 or 2 DBs ? Not ideal but could be done in that timeframe and would take 0 processing time.

balhoff commented 3 years ago

Why is it hard to know the root types? That seems most straightforward to me.

lpalbou commented 3 years ago

You are on the client side, Jim, not minerva. Not all data is available and if you have to request it from the client, it means more delays to do the copy. As discussed, it’s been 2 years so I forgot the payloads returned by barista but at the time, the root types done by Ben were not there yet.

In any case, let’s try to schedule a call next week Jim, so that we can discuss a v2 of the model copy on minerva side (create operation + duplicate the Java object + remap the IDs)

thomaspd commented 3 years ago

I think we could keep the enablers/inputs/outputs, but still not copy the evidence or metadata. This would have the advantage of keeping the likely orthologs in the model to remind the curator what the original enabler/input/output was, as well as copying over small molecule inputs/outputs. The curator could use any of the editing tools to change the gene product entities.

cmungall commented 3 years ago

Agreed. I don't think the additional deletion will incur a large burden on curators, and we can return to more sophisticated functionality in future releases

On Wed, Sep 1, 2021 at 9:07 AM thomaspd @.***> wrote:

I think we could keep the enablers/inputs/outputs, but still not copy the evidence or metadata. This would have the advantage of keeping the likely orthologs in the model to remind the curator what the original enabler/input/output was, as well as copying over small molecule inputs/outputs. The curator could use any of the editing tools to change the gene product entities.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/noctua/issues/606#issuecomment-910431960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOM2LX7FGBZBGJIQ2TDT7ZFSJANCNFSM4G5WHD2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lpalbou commented 3 years ago

Well just to summarize the options short terms for “shallow” copy: 1) remove evidences 2) remove evidences and all enablers 3) remove evidences and all enablers, has input, has output

I thought we were going for 2) but it seems I am reading we are going for 1) . Please confirm.

vanaukenk commented 3 years ago

@lpalbou We have indeed shifted from what was proposed on Tuesday's call.

The reality here is that we have many different options for model copy features, so for now, let's go with the @thomaspd proposal of only removing evidence so the remaining enablers/inputs/outputs can be used as a guide for curators in updating the copied model.

Once we have model copy functionality out and available for curators, we will use their input to further guide us on desired functionalities.

Thanks.

kltm commented 2 years ago

Okay, to dust this off, one background consideration would be to weigh switching the implementation over to the server. This would likely ease changes, future development, and speed/reliability, at the cost of not using what is currently on the table.

A way forward here to restart would be to re-evaluate the two paths forward: an examination of the changes needed to the client copier (and dusting of the interface to test it) and balancing that against the amount of work if the same was to be done in minerva.

pgaudet commented 2 years ago

@kltm Who can/will do this?

kltm commented 2 years ago

@pgaudet That will depend on the strategy followed. We'll have a little more information after some future discussion when people are all around, but likely Tremanye and/or Jim.

vanaukenk commented 2 years ago

From the 2022-03-02 managers call, a summary of the user specifications wrt the actions behind the scenes:

Shallow copy = Everything (ontology terms, relations, enablers, extensions) is copied.
We will NOT copy evidence and other metadata.
A DerivedFrom tag will capture, at the model level, the originating model id.
The new model (all levels--nodes, edges, model itself) will only have the copier as contributor.
A default title, 'copy of x', where x is the original model title will be added.
All copied models will initially have state = development.

ukemi commented 2 years ago

I have tested model copy over the last few days and with the exception of the tickets still open in the project, it appears to be working correctly.

pgaudet commented 2 years ago

Are any of these tickets blocking? We looked at this at Swiss-Prot and we believe it's ready to go to prod. None of the features requested seem blocking to us.

Thanks, Pascale

vanaukenk commented 2 years ago

Moving this to Done. Any future model copy iterations will have new specifications.

geneontology / noctua

Model Copy #606

Current requirements: