geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
45 stars 89 forks source link

Add sanity checks that put all major annotation outputs within a range #1062

Open kltm opened 5 years ago

kltm commented 5 years ago

We are too frequently bitten by issues like https://github.com/geneontology/go-site/issues/1061 , where there is an unseen drop in annotations that turns out to presage an error somewhere in the pipeline.

We need a sanity check, either as separate metadata or in the main metadata, that puts a range on expected annotation volume and either crashes the pipeline or emails.

kltm commented 5 years ago

Note that we already have: https://github.com/geneontology/go-site/blob/master/scripts/sanity-check-ann-report.py This failed to find the issue as it is looking for something rather more severe, at the ~50% mark, rather than the ~8% mark that this would have shown up at. The gross number is at least partially due to the relatively large effects on smaller files. However, this ticket is wanting to have what is essentially an ex post facto check -- we know that there was 424986 lines last time around, why the 5% reduction? This is, naturally, much more brittle and would require regular updates of the metadata file; that said, I think a lot less unseen changes will sneak through this way.

kltm commented 5 years ago

Would start with something like:

- "filename": rgd.gaf
  "expected-lines": 424986
  "acceptable-variation": 0.05
kltm commented 5 years ago

This might actually be a nice place to get behave in place again. Then again, YAML is nice and easy and ubiquitous for us at this point.

kltm commented 5 years ago

Noting that this will branch into the JSON outputs from @lpalbou as well when defined.

kltm commented 5 years ago

@pgaudet #1152 Is a good place to add these before mainlining them.

pgaudet commented 4 years ago

This needs to run on snapshot.

kltm commented 4 years ago

From meeting with @pgaudet and @lpalbou

Examples: