Closed hdrabkin closed 5 years ago
You have two options
You can get the processed PAINT gaf pre-injection here: http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz
But planning ahead, you may want to consider making http://snapshot.geneontology.org/annotations/mgi.gaf.gz your one-stop shop, because we will later start injecting Noctua annotations in there too.
@cmungall What's the difference between getting those from GO http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz versus directly from PAINT ? ftp://ftp.pantherdb.org/downloads/paint/presubmission/gene_association.paint_mgi.gaf.gz
Thanks, Pascale
Thanks @cmungall . I guess at the moment, since we only have pre 5/2017 PAINT, switching the URL to http://snapshot.geneontology.org/products/annotations/paint_mgi.gaf.gz would not require any other changes to the existing workflow (and would thus make it a support issue). On the other hand, extracting the PAINT specific annotations from mgi.gaf will require a redesign of the pipleline (ie, fetch file, extract PAINT; load PAINT vs fetch paint_mgi.gaf.gz and load...).,but as you say, we will eventually need to do this to get our Noctua out (but we load the Noctua GPAD, not gaf so we get all the data (model ids, etc. etc.). One way will almost immediately gets the PAINT loaded asap. Well, at least now I see the options.
@cmungall I dont understand why you send the snapshot link. I thought we wanted people to download a version, so as to all have the same data.
Thanks, Pascale
providers already get snapshot ontology etc, many want latest version
@pgaudet As well, the ones you get from snapshot have been run through the QC/QA pipeline, giving you filters, upgrades, and associated reports.
Right, but when we had the call about documentation a few weeks back I thought we had agreed that we wanted everyone to display 'citable' versions, and the snapshots are not.
@vanaukenk @ukemi
For "public" use, correct. For "internal" use, where people in the GOC what to review or get things at a higher frequency, the snapshot is the correct route.
OK, but PAINT annotations are consumed by MODs to be displayed publicly, it's not like they can edit them (since this is done via PAINT). So why recommend loading them ? (I think this can only lead to confusion about data versions).
Pascale
@pgaudet I'm not quite understanding what you're saying here--those are the PAINT annotations from the source filtered through our QC pipeline. Where else should people be getting them from?
I thought we had recommended that people use a citable release date, and the snapshots are not.
@pgaudet From https://github.com/geneontology/paint/issues/53#issuecomment-410399847 For "public" use, correct. For "internal" use, where people in the GOC want to review or get things at a higher frequency, the snapshot is the correct route.
If this is unclear, we can have a quick call to clarify.
I think the envisioned usage here is to display in the MOD. (MODs assume other people do the PAINT QC)
@hdrabkin please correct if this is wrong.
We blame PAINT annotations on PAINT curators, whoever they may be. 8-) However, if the snapshot is more current, although still filtered and QC/QA'd, (it IS the same thing that will get merged with the contributed files), we'd like to not wait a month to display them here at MGI. We usually load PAINT weekly; it's a drop/reload, so obsoleted terms, etc. will be purged. However, if the snapshot is only updated monthly, it won't really matter.
@hdrabkin The "snapshot" pipeline run is attempted nightly and is the exact same thing that is merged with the final GO pipeline GAF. If you are interested in having the GO pipeline QCed version of the upstream PAINT annotations, that would be the place to get them. The "release" pipeline is the same in every respect except: 1) it has a different URL set (that is versioned), 2) it has a related DOI (from this month), 3) it happens once a month, and 4) it is pushed to AmiGO (and other public data endpoints) when ready. Given what I've seen, you would almost certainly want the "snapshot" version.
I'm adding this to the next managers call. I thought we wanted people to use 'citable' releases for display on their sites. If that's not the case we'll need to update the documentation (http://wiki.geneontology.org/index.php/Release_Pipeline#Consuming_and_Displaying_GO_Data). @vanaukenk @ukemi
@pgaudet We do, but not necessarily for Consortium members. For example, we want to get to a point where tools like DAVID cite our data DOI so that we can immediately know how fresh the data is. Without these DOIs, it's very hard to recreate an experiment or results from an external tool. However, people who are internal to the GOC, and are relying on GO tools or pipeline for working with their functional annotations, are a little outside of that use case. If ontology developers at a MOD want a term for annotation, want to create an annotation for re-ingestion, or want to see reports on their current work, the month-long release cycle that we maintain for outside tools and citation is obviously too long. This is what the "snapshot" pipeline is for; it can be thought of as basically that the Consortium members have outsourced a bit of what they normally do to a central location--our pipeline.
Great. How about we add these guidelines:
"To get the most up-to-date data, groups can download data from the 'daily snapshots'; however, for display in public databases and integration in analysis tools, data from Monthly Official Releases should be used".
@kltm Are we on the same page ?
@pgaudet I'd probably be a little more ginger with hard-and-fast rules here, as all providers are going to have different uses and needs for the data we process. If the context of the above text is the internal wiki Release_Pipeline page, maybe I'd write something like: "The most up-to-date (~daily) data and reports are available from http://snapshot.geneontology.org; in cases where citability or reproducability is a factor, the release data (and associated DOIs) from http://release.geneontology.org should be used." There should probably be no mention of the "snapshot" pipeline for the external docs.
Sure - I was thinking to add this text in the wiki page (internal documentation). I'll use your phrasing for the GO website.
Thanks, Pascale
@ukemi says that MGI will take the entire MGI GAF, get the PAINT annotations from that GAF to load in their database.
@hdrabkin Is this accurate ?
Yes indeedy; In fact, I just approved the change to our load so that we will fetch the new PAINT annotations this weekend.
OK good, so please close the ticket when you consider it resolved.
Pascale
What Id like to have to load PAINT annotations into MGI is the url for the Mod-specific PAINT gaf that is 'injected' into the mod gaf for mgi? Otherwise we have to download the entire final processed mod gaf to get them back out? Is that the only option now? From this, yes: http://snapshot.geneontology.org/annotations/mgi.gaf.gz
Need reply soon because I need to write a new Work order for this to replace our current method
Thanks in advance.