geneontology / geneontology.github.io

Repository for storing GO documentation, directly available through the general GO site
http://geneontology.org
MIT License
5 stars 10 forks source link

New downloads table for ~20 fully supported species with PAN-GO second download option #525

Open suzialeksander opened 2 months ago

suzialeksander commented 2 months ago

As part of the doc drafted at https://docs.google.com/document/d/1j1zO-JHMaXi4ESrjTm_X1amLKdm3vO4szUaeB5LvxvA/edit#heading=h.25jngq5vebjo and the draft at https://github.com/geneontology/geneontology.github.io/pull/523, we need a new table for downloads.

comments below from @thomaspd on the draft doc:

The list of organisms is in the following table [or this can link to a separate page]:

suzialeksander commented 2 months ago

@kltm @pgaudet it looks like the plan is to link GAFs and the paint GAFs. I can pull what we have out of Current and put it in a table, but that's only about 15 files. I don't know where to find the other IBA files. (?)

If these are all in the paint_other.gaf, we'll need a way to split that file- or @thomaspd was the plan to link the same giant file and have users split it themselves?

thomaspd commented 2 months ago

We should get a draft of the table ASAP, so we should fill in the table entries with the best files we have currently. So it's fine to have the link to the same giant file for now. When we have the file broken up by organism, we can update the table accordingly.

suzialeksander commented 2 months ago

To clarify, the links for the table:

@pgaudet or @thomaspd (and @kltm might be able to clarify what the *_valid.gaf means)

kltm commented 2 months ago

Linked items should be treated as URLs we want to keep permanently--we do not want to start playing routing games. We do not want public pages linking to "upstream_and_raw_data".

I'm a little out of the loop on scope and timeline here. I'm assuming this is part of a larger downloads refactor? For the purposes of drafting, *_valid represents a file that has had the bulk of the gorules applied to it, but has not yet been merged with other files.

suzialeksander commented 2 months ago

@pgaudet can this be added to the Manager's call next week, to let everyone know what's going on? We'll need a lot more of @kltm or other software's time, and probably need to create new files including the split of the uniprot_all.gaf (and new locations??).

kltm commented 2 months ago

Tag back to https://github.com/geneontology/project-management/issues/82#issuecomment-2032996365. TBD, seems like this might be better in that project (or otherwise merged).

suzialeksander commented 2 months ago

I'd like a little more clarification. I agree these are nearly the same thing, but not quite. Are we to have two pages? The other issue outlines a sortable table with annotation counts and one link per species, this is a table with two links.

Is this just a case of the same specs changing over time, or are these actually two different tables?

suzialeksander commented 2 months ago

Discussing this with @pgaudet, if we want to have PAINT annotations as a separate product, we will likely need pipeline refactoring. Upstream products (paint) are likely on a different ontology version than the release product, etc.

suzialeksander commented 2 months ago

Decisions from managers call:

kltm commented 1 month ago

@suzialeksander @pgaudet I wanted to check in on the priority of this item and the availability of a spec, as @pkalita-lbl may have some time.