bmeg / bmeg-etl

ETL configuration for BMEG
1 stars 2 forks source link

duplicates in GeneOntologyAnnotation.Edge.json #327

Closed bwalsh closed 1 year ago

bwalsh commented 5 years ago
{"_id": "(GO:0004252)--GeneOntologyAnnotation->(ENSG00000282025)", "gid": "(GO:0004252)--GeneOntologyAnnotation->(ENSG00000282025)", "label": "GeneOntologyAnnotation", "from": "GO:0004252", "to": "ENSG00000282025", "data": {"evidence": "TAS", "title": "Immunoglobulin kappa variable 2-28", "references": []}}
{"_id": "(GO:0004252)--GeneOntologyAnnotation->(ENSG00000282025)", "gid": "(GO:0004252)--GeneOntologyAnnotation->(ENSG00000282025)", "label": "GeneOntologyAnnotation", "from": "GO:0004252", "to": "ENSG00000282025", "data": {"evidence": "TAS", "title": "Immunoglobulin kappa variable 2-28", "references": []}}
bwalsh commented 5 years ago

Additionally, there are instances that differ only in references

{"_id": "(GO:0000012)--GeneOntologyAnnotation->(ENSG00000042088)", "gid": "(GO:0000012)--GeneOntologyAnnotation->(ENSG00000042088)", "label": "GeneOntologyAnnotation", "from": "GO:0000012", "to": "ENSG00000042088", "data": {"evidence": "IBA", "title": "Tyrosyl-DNA phosphodiesterase 1", "references": ["PMID:21873635"]}}
{"_id": "(GO:0000012)--GeneOntologyAnnotation->(ENSG00000042088)", "gid": "(GO:0000012)--GeneOntologyAnnotation->(ENSG00000042088)", "label": "GeneOntologyAnnotation", "from": "GO:0000012", "to": "ENSG00000042088", "data": {"evidence": "IDA", "title": "Tyrosyl-DNA phosphodiesterase 1", "references": ["PMID:15811850"]}}
bwalsh commented 5 years ago

unclear if this is desired behavior. gen3 enforces a uniq between source and destination keys

bwalsh commented 5 years ago

Also, getting geneontologies that point to missing genes:

ERROR:  insert or update on table "edge_geneontologytermannotationsgene" violates foreign key constraint "edge_geneontologytermannotationsgene_dst_id_fkey"
DETAIL:  Key (dst_id)=(c64db77b-05a9-5b9a-bac3-d758b6e7a8fc) is not present in table "node_gene".
RJHB392:compose-services walsbr$ grep c64db77b-05a9-5b9a-bac3-d758b6e7a8fc ~/gen3_etl/output/reference/edge_geneontologytermannotationsgene.tsv
22a45636-100d-560a-805f-3e2519c418fac64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IEA","title":"Immunoglobulin lambda variable 5-37","references":["GO_REF:0000037"],"from":"GO:0002250","to":"ENSG00000281471"}
234a2fa7-a873-5231-9dbd-949673ad4d5ac64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IEA","title":"Immunoglobulin lambda variable 5-37","references":["GO_REF:0000037"],"from":"GO:0003823","to":"ENSG00000281471"}
ecffc5bf-5089-5684-bcca-c04b2dcf4314c64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IEA","title":"Immunoglobulin lambda variable 5-37","references":["GO_REF:0000039"],"from":"GO:0005886","to":"ENSG00000281471"}
2fb29112-c6d9-5955-a4d3-86b1f3176606c64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IBA","title":"Immunoglobulin lambda variable 5-37","references":["PMID:21873635"],"from":"GO:0002377","to":"ENSG00000281471"}
cf4b9054-b866-529c-9f0f-4345682ab751c64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IBA","title":"Immunoglobulin lambda variable 5-37","references":["PMID:21873635"],"from":"GO:0006955","to":"ENSG00000281471"}
003f46f2-1830-535d-9072-2b3f145c42d4c64db77b-05a9-5b9a-bac3-d758b6e7a8fc{}{}{"evidence":"IBA","title":"Immunoglobulin lambda variable 5-37","references":["PMID:21873635"],"from":"GO:0005615","to":"ENSG00000281471"}