geneontology / paint

This curation tool allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions.
Other
4 stars 4 forks source link

How to get new Paint annotations into the MGI database #53

Closed hdrabkin closed 5 years ago

hdrabkin commented 6 years ago

What Id like to have to load PAINT annotations into MGI is the url for the Mod-specific PAINT gaf that is 'injected' into the mod gaf for mgi? Otherwise we have to download the entire final processed mod gaf to get them back out? Is that the only option now? From this, yes: http://snapshot.geneontology.org/annotations/mgi.gaf.gz

Need reply soon because I need to write a new Work order for this to replace our current method

Thanks in advance.

cmungall commented 6 years ago

You have two options

You can get the processed PAINT gaf pre-injection here: http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz

But planning ahead, you may want to consider making http://snapshot.geneontology.org/annotations/mgi.gaf.gz your one-stop shop, because we will later start injecting Noctua annotations in there too.

pgaudet commented 6 years ago

@cmungall What's the difference between getting those from GO http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz versus directly from PAINT ? ftp://ftp.pantherdb.org/downloads/paint/presubmission/gene_association.paint_mgi.gaf.gz

Thanks, Pascale

hdrabkin commented 6 years ago

Thanks @cmungall . I guess at the moment, since we only have pre 5/2017 PAINT, switching the URL to http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz would not require any other changes to the existing workflow (and would thus make it a support issue). On the other hand, extracting the PAINT specific annotations from mgi.gaf will require a redesign of the pipleline (ie, fetch file, extract PAINT; load PAINT vs fetch paint_mgi.gaf.gz and load...).,but as you say, we will eventually need to do this to get our Noctua out (but we load the Noctua GPAD, not gaf so we get all the data (model ids, etc. etc.). One way will almost immediately gets the PAINT loaded asap. Well, at least now I see the options.

pgaudet commented 6 years ago

@cmungall I dont understand why you send the snapshot link. I thought we wanted people to download a version, so as to all have the same data.

Thanks, Pascale

cmungall commented 6 years ago

providers already get snapshot ontology etc, many want latest version

kltm commented 6 years ago

@pgaudet As well, the ones you get from snapshot have been run through the QC/QA pipeline, giving you filters, upgrades, and associated reports.

pgaudet commented 6 years ago

Right, but when we had the call about documentation a few weeks back I thought we had agreed that we wanted everyone to display 'citable' versions, and the snapshots are not.

@vanaukenk @ukemi

kltm commented 6 years ago

For "public" use, correct. For "internal" use, where people in the GOC what to review or get things at a higher frequency, the snapshot is the correct route.

pgaudet commented 6 years ago

OK, but PAINT annotations are consumed by MODs to be displayed publicly, it's not like they can edit them (since this is done via PAINT). So why recommend loading them ? (I think this can only lead to confusion about data versions).

Pascale

kltm commented 6 years ago

@pgaudet I'm not quite understanding what you're saying here--those are the PAINT annotations from the source filtered through our QC pipeline. Where else should people be getting them from?

pgaudet commented 6 years ago

I thought we had recommended that people use a citable release date, and the snapshots are not.

kltm commented 6 years ago

@pgaudet From https://github.com/geneontology/paint/issues/53#issuecomment-410399847 For "public" use, correct. For "internal" use, where people in the GOC want to review or get things at a higher frequency, the snapshot is the correct route.

If this is unclear, we can have a quick call to clarify.

pgaudet commented 6 years ago

I think the envisioned usage here is to display in the MOD. (MODs assume other people do the PAINT QC)

@hdrabkin please correct if this is wrong.

hdrabkin commented 6 years ago

We blame PAINT annotations on PAINT curators, whoever they may be. 8-) However, if the snapshot is more current, although still filtered and QC/QA'd, (it IS the same thing that will get merged with the contributed files), we'd like to not wait a month to display them here at MGI. We usually load PAINT weekly; it's a drop/reload, so obsoleted terms, etc. will be purged. However, if the snapshot is only updated monthly, it won't really matter.

kltm commented 6 years ago

@hdrabkin The "snapshot" pipeline run is attempted nightly and is the exact same thing that is merged with the final GO pipeline GAF. If you are interested in having the GO pipeline QCed version of the upstream PAINT annotations, that would be the place to get them. The "release" pipeline is the same in every respect except: 1) it has a different URL set (that is versioned), 2) it has a related DOI (from this month), 3) it happens once a month, and 4) it is pushed to AmiGO (and other public data endpoints) when ready. Given what I've seen, you would almost certainly want the "snapshot" version.

pgaudet commented 6 years ago

I'm adding this to the next managers call. I thought we wanted people to use 'citable' releases for display on their sites. If that's not the case we'll need to update the documentation (http://wiki.geneontology.org/index.php/Release_Pipeline#Consuming_and_Displaying_GO_Data). @vanaukenk @ukemi

kltm commented 6 years ago

@pgaudet We do, but not necessarily for Consortium members. For example, we want to get to a point where tools like DAVID cite our data DOI so that we can immediately know how fresh the data is. Without these DOIs, it's very hard to recreate an experiment or results from an external tool. However, people who are internal to the GOC, and are relying on GO tools or pipeline for working with their functional annotations, are a little outside of that use case. If ontology developers at a MOD want a term for annotation, want to create an annotation for re-ingestion, or want to see reports on their current work, the month-long release cycle that we maintain for outside tools and citation is obviously too long. This is what the "snapshot" pipeline is for; it can be thought of as basically that the Consortium members have outsourced a bit of what they normally do to a central location--our pipeline.

pgaudet commented 6 years ago

Great. How about we add these guidelines:

"To get the most up-to-date data, groups can download data from the 'daily snapshots'; however, for display in public databases and integration in analysis tools, data from Monthly Official Releases should be used".

@kltm Are we on the same page ?

kltm commented 6 years ago

@pgaudet I'd probably be a little more ginger with hard-and-fast rules here, as all providers are going to have different uses and needs for the data we process. If the context of the above text is the internal wiki Release_Pipeline page, maybe I'd write something like: "The most up-to-date (~daily) data and reports are available from http://snapshot.geneontology.org; in cases where citability or reproducability is a factor, the release data (and associated DOIs) from http://release.geneontology.org should be used." There should probably be no mention of the "snapshot" pipeline for the external docs.

pgaudet commented 6 years ago

Sure - I was thinking to add this text in the wiki page (internal documentation). I'll use your phrasing for the GO website.

Thanks, Pascale

pgaudet commented 5 years ago

@ukemi says that MGI will take the entire MGI GAF, get the PAINT annotations from that GAF to load in their database.

@hdrabkin Is this accurate ?

hdrabkin commented 5 years ago

Yes indeedy; In fact, I just approved the change to our load so that we will fetch the new PAINT annotations this weekend.

pgaudet commented 5 years ago

OK good, so please close the ticket when you consider it resolved.

Pascale