Closed kltm closed 1 year ago
Can we add this to a project ?
@pgaudet No problem, but I'm not sure where it would go. I suspect it's its own mini-project when all said and done.
here are the instructions to add to taxslim https://github.com/obophenotype/ncbitaxon/blob/master/subsets/README.md
@balhoff The species should go in this file: https://github.com/obophenotype/ncbitaxon/blob/master/subsets/taxon-subset-ids.txt
geneontology/neo
make neo.obo
uses http://ftp.ebi.ac.uk/pub/contrib/goa/uniprot_reviewed.gpi.gz
...@.../neo$ grep -oh "NCBITaxon:[0-9]*" neo.obo | sort | uniq | wc -l
14182
@.../neo$ grep -oh "taxon:[0-9]*" mirror/uniprot_reviewed.gpi.tmp | sort | uniq | wc -l
14216
@pgaudet there are currently ~6k entries in https://raw.githubusercontent.com/obophenotype/ncbitaxon/master/subsets/taxon-subset-ids.txt. Combining and de-duping these, we get 16283 entries. Does that sound right?
If this looks right, I can do a PR (had to fork as didn't have perms for demonstration) https://github.com/geneontology/ncbitaxon/blob/issue-go-site-1955-annotatable-species/subsets/taxon-subset-ids.txt @pgaudet @cmungall
Looks OK For example Taxon 107268 has some reviewed entries: ~https://www.uniprot.org/uniprotkb/A0A8K1C0V2/entry~ (this one was NOT a reviewed entry)
https://www.uniprot.org/uniprotkb/Q9T3Q2/entry
Could you merge this?
Thanks, Pascale
@kltm check the second to last line: https://github.com/geneontology/ncbitaxon/blob/946f1758908edd4d11a0b77030fbcd3264643f05/subsets/taxon-subset-ids.txt#L16282
Oups! Didn't read this far !
@balhoff Whoops, that's me--I probably introduced that with cat
ing the files together.
I don't have permission to go beyond https://github.com/obophenotype/ncbitaxon/pull/74 Tagging @cmungall @pgaudet @balhoff
Can you fix the conflict? Then I think I can merge.
@balhoff I do not have the power to fix the conflict in that repo once the PR is created. It seems to just be additions, so I'm not sure why it's choking...
Okay, I fixed and merged it.
@balhoff Cheers!
@pgaudet To close out this issue, we need to be doing this periodically or dynamically. While it's tempting to spend the energy adding something automated to the NEO pipeline to do this, I get the feeling that once a year might be fine? If we can work out what frequency and how to remind ourselves, would that allow this to be closed?
I agree, I think keeping it up to date as needed, and checking periodically would be fine.
I would also do this 'on request'
Will this be added in Noctua in the next Noctua update? Or can this be done separately?
@pgaudet (Just writing down our earlier conversation), the answer is "both". We do this as part of the Noctua update outages every two weeks, refreshing minerva and solr with NEO. We can also do just solr v/quickly at any point, but that means that things appear as just IDs in Noctua everywhere that's not an autocomplete.
@balhoff
Why isnt the species showing for Q9T3Q2 in http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel%3A636d9ce800000575
Thanks, Pascale
Based on discussion at GOC meeting we likely need to update the NEO build code which puts species name abbreviations into gene names.
Continuing here: https://github.com/geneontology/neo/issues/116
Currently, some species that are used, mostly in Noctua, are not represented in the taxon slim, causing issues like missing labels, etc. See: https://github.com/geneontology/noctua-landing-page/issues/87
We would like to come up with an SOP or periodic updates or a dynamic solution.