geneontology / gocamgen

Base repo for constructing GO-CAM model RDF
0 stars 0 forks source link

Create progress reports that include GPAD lines for annotations that don't get converted #36

Open vanaukenk opened 5 years ago

vanaukenk commented 5 years ago

As part of the pipeline, we need to output a report that lists the lines in the GPAD file that do NOT get converted according to the current iteration of the rules so curators can check the annotation to see if it's correct (and the rule revised) or should be fixed.

dustine32 commented 5 years ago

@vanaukenk @ukemi I can probably piggyback onto this ticket.

Basic outputs of import code should be:

  1. Translated model TTL files - Great!
  2. GPAD lines filtered out - "the leftovers/remainders"
  3. Report of invalid (and thus untranslated) extensions

Outputs 2 and 3 can likely be attached to the PRs into noctua-dev and I can also put the simple counts of filtered lines/invalid extensions in those tiny report I've already been outputting. Example:

source_path: http://www.informatics.jax.org/downloads/reports/mgi.gpa.gz
download_date: Thu Sep 26 10:09:33 2019
header_date: 09/25/2019
# of models generated: 19736
​# GPAD lines filtered out: #####
# invalid annotation extensions: #####