Closed lpalbou closed 2 years ago
@lpalbou How were these requirements generated? I think some of them are probably not needed/required. We might want to have a minimal ticket and then break others into additions; the previous final form was a bit lighter than this.
@kltm discussed with Kimberly, David and Pascale yesterday. The owl:cloneFrom was discussed with @cmungall about 2 weeks ago. The last two requirements are lesser priority. Anyhow, I am working on it. Only issue that I am having is barista/minerva automatically add both contributor and date, so I have to send additional bunch of requests to overwrite those annotations after individuals/facts have been created. It's not super nice but doing otherwise would require larger changes on how barista/minerva communicate.
Understood--I'm sure this will all work out in the wash and I'm really excited about getting copying out there!
Looking at this, I think that points 1 and 3 are the ones that have perked my ears; separate, but related to the same system. The new requirements seem to be a slightly different take on how "touching" would generally be apportioned and I think that attempting certain kinds of overrides from the client may be the wrong direction. To produce the effects given above, there may have to be a fair amount of removal of annotations that amounts to essentially a shadow model of credit. Or not, the devil is in the details. I think that is may be good to separate out the absolutely-has-to-happen-or-we-don't-want-copying bits from the this-would-be-great-to-haves; more granular tickets are easier to work with for these threads as well.
instead of owl:cloneFrom use something like prov, there is a relationship there like derivedFrom (different from the RO derived_from which is material entities)
Talking to @lpalbou , the current PR is https://github.com/geneontology/noctua/pull/609 and https://github.com/berkeleybop/bbop-manager-minerva/pull/2
@kltm will pull in bbop-manager-minerva changes (new functionality only), then get the new dev up.
Quick update: as I feared, that's indeed not the last version of the model copy on those two PRs. I stopped working on it 4 months ago while testing alternative solutions to speed up the process, so I have to find the last relevant / usable version. I will update this ticket when it's done.
@lpalbou NP. If you can't make the restart window today for Minerva, we'll take care of it later on--this is just a low-hanging-fruit addition for now.
I think I have located the last version, I am creating a new instance from scratch to check.
@kltm sorry for the extra precaution, but as it's been some time I wanted to make sure. Those PRs are actually pointing to the last usable versions of the model copy, so you can merge them both.
Notes:
Pending a little thought about https://github.com/geneontology/noctua/pull/609#issuecomment-510709716, I did not merge in yet (as it may not be intended to work as such), but switched to @lpalbou 's branch and rebuilt the package around the merged and updated bbop-manager-minerva.
noctua-dev
now has a lovely copy function to start thinking about iterating on.
For discussion of provenance tracking: https://www.w3.org/TR/prov-o/#wasDerivedFrom
PR for adding wasDerivedFrom to the shex schema https://github.com/geneontology/go-shapes/pull/266
For reference, method in bbop-manager: https://github.com/berkeleybop/bbop-manager-minerva/blob/a366911d7537eefc1815b1f8dacc7b023c8268f4/lib/manager.js#L1143. It was tested 2 years ago and is probably good enough for a V1. Fairly easy to integrate in any UI (see mockup). Only addition would be:
For a V2, recommend either:
From 2021-08-31 call:
We've agreed to move forward with a 'shallow copy' functionality.
Shallow copy = ontology terms and relations are copied in full, but genes (and other enablers), evidence, and other metadata are not copied to the new model. Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.
A DerivedFrom tag will capture, at the model level, the originating model id.
The new model will only have the copier as contributor.
A default title, 'copy of x', where x is the original model title will be added.
All copied models will initially have state = development.
@lpalbou @kltm @ukemi @thomaspd @cmungall @balhoff @pgaudet @sierra-moxon
Please comment as needed.
Hi @vanaukenk
You write
Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.
Is that what we decided? Especially if we annotate a different species, this information could be misleading; I thought we would strip these as well.
Thanks, Pascale
@pgaudet I'm thinking here of chemical inputs and outputs, e.g. for an enzyme activity.
Ah ok - sure
but not gene products for example for transcription factors, and probably not cell types and anatomy either?
Shallow copy = ontology terms and relations are copied in full, but genes (and other enablers), evidence, and other metadata are not copied to the new model. Inputs and Outputs, if not genes or gene products, should also be copied over as well as contextual information, e.g. cells and tissues types.
Apologies, but for me the simpler way to look at it is: copy everything but enablers and evidences. And this will be the way it's implemented (two simple tests on some specific relationships).
Inputs and Outputs
Need to be clarified from above. Do we discard as well everything from has input and has output ? @thomaspd
DerivedFrom tag
It was implemented both at the model level and at the individual level but it would be easy to modulate the default behavior. @kltm
Note: creating those links only have pros IMO & will enable later the creation of functionalities (eg if the original activity changes, notify curators who copied the models or tag copied models; or link / group all models of a pathway in different species, etc)
The new model will only have the copier as contributor.
Tag @kltm @tmushayahama as this will be solved currently at the UI level, by subtracting the list of contributors from the original model and the copied model
All copied models will initially have state = development.
Not the current behavior, but hopefully will have time to fix that.
but not gene products for example for transcription factors, and probably not cell types and anatomy either?
Need to clarify exactly if and what we discard from has input / has output.
Note: I will take a look but while it's easy to discard completely has input and/or has output, I believe filtering them conditionally based on their value may be tricky with the current code and unrealistic before I leave.
Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.
But we may be worrying too much about details. If in the first pass, curators need to explicitly delete some inputs or outputs I think it is OK, overall it is still a huge timesaver.
Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.
That makes sense to me.
What if we use step-by-step UI wizard type of copy with bunch of checkboxes, radio buttons and let the user choose. Power to UI :). But I agree with Chris, I think first pass should be a proof of concept as @lpalbou copy code strategy works fantastic on landing page.
@vanaukenk I think another approach is I can unhide the copy functionality on landing page for dev and actually discuss while looking at a current "prototype" (what LP had) than tweaking before checking whats on the table.
Perhaps we just exclude everything that is a macromolecular machine, regardless of whether it is an enabler, input, or output.
Ideally yes, but I believe it’s too much to ask for the bbop-manager-minerva lib, and if it was possible it would probably be too slow. I would put a conditional filtering on a value root type (eg filtering all macromolecules) as something to be done on a V2 made in minerva directly. Right now, what we could do (please advise), is to either keep or remove completely the entities of has input and/or has output.
If one of our main uses cases for shallow copy is to copy models from one organism to another, then if we have to choose between keep or remove completely entities for v1, then I think we have to choose remove completely.
sounds good!
On Tue, Aug 31, 2021 at 1:36 PM vanaukenk @.***> wrote:
If one of our main uses cases for shallow copy is to copy models from one organism to another, then if we have to choose between keep or remove completely entities for v1, then I think we have to choose remove completely.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/noctua/issues/606#issuecomment-909615759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOI6PDCMP55FUZPMBJDT7U4OVANCNFSM4G5WHD2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@cmungall just thinking out loud of a bad workaround to filter out macromolecules even in a V1.
I believe we could filter out all macromolecules from enablers/has input/has output.. by doing a dumb check on the CURIE prefixes. Anything starting with HGNC, UniProtKB, FB, WB & MODs etc would be macromolecules (do we use MOD ids for small molecules ? I think / hope not.. or possibly they would have different ID format ?), and small molecules probably only comes from 1 or 2 DBs ? Not ideal but could be done in that timeframe and would take 0 processing time.
Why is it hard to know the root types? That seems most straightforward to me.
You are on the client side, Jim, not minerva. Not all data is available and if you have to request it from the client, it means more delays to do the copy. As discussed, it’s been 2 years so I forgot the payloads returned by barista but at the time, the root types done by Ben were not there yet.
In any case, let’s try to schedule a call next week Jim, so that we can discuss a v2 of the model copy on minerva side (create operation + duplicate the Java object + remap the IDs)
I think we could keep the enablers/inputs/outputs, but still not copy the evidence or metadata. This would have the advantage of keeping the likely orthologs in the model to remind the curator what the original enabler/input/output was, as well as copying over small molecule inputs/outputs. The curator could use any of the editing tools to change the gene product entities.
Agreed. I don't think the additional deletion will incur a large burden on curators, and we can return to more sophisticated functionality in future releases
On Wed, Sep 1, 2021 at 9:07 AM thomaspd @.***> wrote:
I think we could keep the enablers/inputs/outputs, but still not copy the evidence or metadata. This would have the advantage of keeping the likely orthologs in the model to remind the curator what the original enabler/input/output was, as well as copying over small molecule inputs/outputs. The curator could use any of the editing tools to change the gene product entities.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/noctua/issues/606#issuecomment-910431960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOM2LX7FGBZBGJIQ2TDT7ZFSJANCNFSM4G5WHD2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Well just to summarize the options short terms for “shallow” copy: 1) remove evidences 2) remove evidences and all enablers 3) remove evidences and all enablers, has input, has output
I thought we were going for 2) but it seems I am reading we are going for 1) . Please confirm.
@lpalbou We have indeed shifted from what was proposed on Tuesday's call.
The reality here is that we have many different options for model copy features, so for now, let's go with the @thomaspd proposal of only removing evidence so the remaining enablers/inputs/outputs can be used as a guide for curators in updating the copied model.
Once we have model copy functionality out and available for curators, we will use their input to further guide us on desired functionalities.
Thanks.
Okay, to dust this off, one background consideration would be to weigh switching the implementation over to the server. This would likely ease changes, future development, and speed/reliability, at the cost of not using what is currently on the table.
A way forward here to restart would be to re-evaluate the two paths forward: an examination of the changes needed to the client copier (and dusting of the interface to test it) and balancing that against the amount of work if the same was to be done in minerva.
@kltm Who can/will do this?
@pgaudet That will depend on the strategy followed. We'll have a little more information after some future discussion when people are all around, but likely Tremanye and/or Jim.
From the 2022-03-02 managers call, a summary of the user specifications wrt the actions behind the scenes:
I have tested model copy over the last few days and with the exception of the tickets still open in the project, it appears to be working correctly.
Are any of these tickets blocking? We looked at this at Swiss-Prot and we believe it's ready to go to prod. None of the features requested seem blocking to us.
Thanks, Pascale
Moving this to Done. Any future model copy iterations will have new specifications.
This ticket replace and complete the following old tickets:
Current requirements:
Don't hesitate to add any requirements / comments / thoughts