Open dustine32 opened 2 years ago
Cheers! Interesting that one didn't make it--I didn't expect that.
We should probably sort this into the right project as owner and specs are assembled.
Thanks Dustin. Based the discussion at the managers meeting and my discussion with Dustin today, I'll be the project owner, and Dustin the tech lead. We discussed a process for SynGO loads with the following steps (Dustin please correct whatever I got wrong or missed):
@thomaspd Test files available now here: http://skyhook.berkeleybop.org/issue-238-wormbase-test-pipeline/annotations/
I'll plan on fixing SYNGO:1805 and updating the PR.
@thomaspd Noting that the one missing model SYNGO:1805 has been fixed and added to this PR: https://github.com/geneontology/noctua-models/pull/235#issuecomment-1159328620
@kltm will make a noctua_*.gpad.gz
file for issue-237-mgi-test-pipeline
and last release
for @pgaudet to compare difference.
@kltm Sorry, confusingly the test pipeline with the latest SynGO data is issue-238-wormbase-test-pipeline
(not the mgi
one).
@pgaudet I've emailed you a pair of files as discussed. As a note to @dustine32 , they were generated with:
zgrep -i [[:space:]]SynGO[[:space:]] noctua_*.gpad.gz
Hi @kltm @dustine32
I checked both the old and new SynGO data set, everything looks ok to me too.
Summary:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| Old | New -- | -- | -- Number of annotations | 26371 | 43722 Distinct ECO | 31 | 36 Distinct GO | 235 | 256 Distinct PMID | 1110 | 1467
This ticket is for defining a testing/release plan for any new SynGO JSON-to-TTL conversion load. What stats should we track? Who should test? What specifically should they test (e.g. check conversion stats, "look" at GO-CAMs in Noctua, check GPAD output)?
Input: JSON file from https://github.com/geneontology/syngo2lego_data_conversion Output: TTL files (e.g. SYNGO_1234.ttl)
Some stats:
For the conversion load in #2 , here are the stats:
(Looks like one model didn't get converted to TTL: SYNGO:1805 - RGD:621347 GO:0099090)
Downstream effects should also be considered though that can probably be handled/tracked in a different repo. For instance, the "ECO code instance count" might closely correlate with the total number of
contributor=SynGO
annotations produced by the GO pipeline.Tagging @thomaspd @vanaukenk @pgaudet @kltm
Feel free to move this ticket to a different repo if that makes sense!