geneontology / noctua-landing-page

1 stars 0 forks source link

Blank entries at the top of the list for Organism filter #87

Closed vanaukenk closed 1 year ago

vanaukenk commented 2 years ago

Clicking on the Organism filter, there are eight blank entries at the top of the list:

image

Clicking on a blank entry brings up a fungal model for one of the species that Marc is curating. We need to add these new species to the taxslim, I believe, as we had a similar issue when we first added SARS CoV2 models - see #30

@tmushayahama @balhoff @cmungall

kltm commented 2 years ago

@vanaukenk Would this be general data maintenance or Noctua-specific (like the last one)?

vanaukenk commented 2 years ago

@kltm - I don't know, honestly. I'd need some help understanding exactly what the root cause of the missing taxon data is.

kltm commented 2 years ago

@vanaukenk Looking at https://github.com/obophenotype/ncbitaxon/pull/59, I assume we just need to add the missing taxon data in there and let it filter through. It would be nice to get ahead of these though. I'm assuming that there are coming from groups that are doing something non-MOD? Would it be possible to get a prospective list of what they're doing so that we don't have these sync gaps?

vanaukenk commented 2 years ago

Looking at https://github.com/obophenotype/ncbitaxon/pull/59, I assume we just need to add the missing taxon data in there and let it filter through.

It would be nice to get ahead of these though. I'm assuming that there are coming from groups that are doing something non-MOD?

I completely agree. These 'blank' species are fungal species in metabolic pathway models that Marc F. is making using UniProtKB accessions.

Would it be possible to get a prospective list of what they're doing so that we don't have these sync gaps?

These species must be represented in the larger UniProtKB file we use to create neo, so is there a way to implement a check, or pipeline step, to ensure that all species represented in neo are also in the taxslim? Does something like that make sense?

kltm commented 2 years ago

There would have to be a bunch more mechanism involved, I believe, as https://github.com/obophenotype/ncbitaxon is an external organization to ours. We could probably do a one-off, but we should aim towards something for the long term as well. Theoretically, for the GO, we should not be producing annotations outside of the 142, right? Would we want to allow those in though? I can add this to the go software call agenda for next week.

vanaukenk commented 2 years ago

I agree with aiming towards a long-term, sustainable solution.

Thinking out loud....we could either just let all the 142 species in, or periodically comb the existing GO-CAMs and only list species for which there is a model. The former seems simpler, but would give a longer list with only a small percentage of model-relevant species, while the latter seems potentially more fiddly but would keep the list focused and relevant.

kltm commented 2 years ago

Continuing to spitball, I feel like we'll eventually end up at the 142 anyways as the Noctua ecosystem touches more things, so why not get ahead? Anyways, if it's a set list, we can more easily control how it's viewed by users.

vanaukenk commented 2 years ago

I'm good with trying the full 142 and brainstorming about how best to display which species are represented in GO-CAMs.

Shall we chat about next steps at the next technical call?

balhoff commented 2 years ago

I think we could stop using the taxon-slim and just download the whole thing and extract a module from that. Computers are more powerful and networks are faster since the slim was created. :-) We just need to decide where to do this in the pipeline.

pgaudet commented 2 years ago

Discussion on the managers call: any species with an annotation should be present in the drop-down menu (but not others)

pgaudet commented 2 years ago

Discussion on the managers call: as a stop-gap, add NCBI taxon ID in the empty lines

pgaudet commented 2 years ago

This solution is OK for Swiss-Prot curators

pgaudet commented 1 year ago

No more empty lines in the species filter:

image