geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Ecoli data is gone from NEO as the upstream source changed #114

Open kltm opened 1 year ago

kltm commented 1 year ago

In exploring the NEO load, we discovered that there are no ecoli entries, likely due to the upstream file change.

To close this, update to the new (temp) file and reload.

kltm commented 1 year ago

Okay, I think that this may be cleared naturally as the datasets.json does seem to refresh to the correct value.

kltm commented 1 year ago

Check next week on a test server.

kltm commented 1 year ago

Data does seem to be in there now, although we may want to do more tweaks.

vanaukenk commented 1 year ago

Just checking the autocomplete on the Noctua Landing Page, I can find E. coli entities, but the species/taxon for E. coli K12 is shown as ecocyc, rather than one of the abbreviations (e.g. Atal) or an NCBI taxon id. I'm not sure if this is what was intended.

image

kltm commented 1 year ago

I think "intended" here is not quite right. Maybe. Initially, I thought this might be due to the ongoing deal with https://github.com/geneontology/go-site/issues/1961, but it seems to stem from (by several steps) from https://github.com/geneontology/go-site/blob/8b649d799b522af9ca28f560f71ec1c978076d99/metadata/datasets/ecocyc.yaml#L22 being null. And it has been like that for years, so I'm not sure if this was intentional for some reason or not? Easy enough to fix it that was an oversight somewhere along the way.

That said, this is apparently the way it was before the churn for https://github.com/geneontology/go-site/issues/1961 started, so at least that much is correct.

suzialeksander commented 1 year ago

FWIW https://github.com/geneontology/go-site/pull/1994 added species_code: Ecol this for a different ticket

kltm commented 1 year ago

Recheck today after outage

kltm commented 1 year ago

Ugh, issue is persisting.

kltm commented 1 year ago

Okay, @pgaudet , I think I've tracked this back to an assumption in the "NEO Makefile builder" that believes that everything is compressed, which is not true for the ecoli/ecocyc data.

2023-04-13 17:46:32 (441 KB/s) - ‘mirror/18.E_coli_MG1655.goa.tmp’ saved [11407440/11407440]

gzip -dc mirror/18.E_coli_MG1655.goa | ./gaf2obo.pl -s Ecol -n ecocyc > target/neo-ecocyc.obo.tmp && mv target/neo-ecocyc.obo.tmp target/neo-ecocyc.obo

gzip: mirror/18.E_coli_MG1655.goa: not in gzip format
kltm commented 1 year ago

Testing; build in pipeline.

@pgaudet @vanaukenk If this works (and I believe it will), we probably want to get this out before the next outage in a month.

kltm commented 1 year ago

@pgaudet @vanaukenk I believe this is (finally) working now.

pgaudet commented 1 year ago

Maybe there are multiple problems - in this model http://noctua.geneontology.org/workbench/noctua-visual-pathway-editor/?model_id=gomodel:62f58d8800001680 the gene label still doesn't show.

Thanks, Pascale

kltm commented 1 year ago

@pgaudet I think that that is a different issue: we can confirm that Ecoli data is now present in NEO. As this fix went in after the outage, it may be related to that, or another issue. That said, it appears in autocomplete dropdowns now, so the data is there (which is the scope of this ticket).

Also note: https://github.com/geneontology/neo/issues/111