Closed kltm closed 2 years ago
@dustine32 "eyeballed the annotation file sizes real quick and they 'looked normal'".
A current working theory from @dustine32 is that it may be "an upstream data issue (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy) or if something else (on our side?) is breaking the go-stats tax".
Short story: This is on our side but it's just a bug with the reporting, not the actual product data. To fix, I think we just need to remove a hard-coded aspgd.gaf
reference from a script run from docker golr-autoindex
.
Long story: Noticed that there was a drastic reduction of taxa returned from the API call to https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy - 129 on 2022-04-19 vs. 5168 on 2022-03-22. This is called from the go-stats code. It turns out this is due to the lower number of taxons we send in the params, which come from the all_annotations DS retrieved from the Golr instance locally running in the pipeline.
Backtracking to the logs for the in-pipeline Golr loading step, I see a new FileNotFoundException
for http://skyhook.berkeleybop.org/release/annotations/aspgd.gaf.gz.
This makes sense as aspgd was recently dropped as a product so wouldn't be available. Now need to remove the hard-coded reference in the run-indexer.sh
script.
@kltm I think the action items are now:
aspgd.gaf.gz
reference in golr-autoindex/run-indexer.sh
here@dustine32 Great--thank you for getting to the bottom of this. The fix is actually pretty simple (striking out your TODO list): when I propagated the removal of aspgd, I did not (forgot to) apply it to release
. The variable in the docker image is a default and not used, as it's supplied from outside. I believe that chain of events is this:
I've put the fix in place and release
is ready to go again. I think there is a little GeneDB work to do before triggering.
Reported by @pgaudet:
"It looks like we lost more than half of the annotations: (data from http://skyhook.berkeleybop.org/release/release_stats/go-annotation-changes.tsv)"
Also, that go-annotation-changes.tsv has data that never used to be there, that starts with “WARNINGS”