Knowledge-Graph-Hub / kg-microbe

https://knowledge-graph-hub.github.io/kg-microbe/index.html
BSD 3-Clause "New" or "Revised" License
15 stars 3 forks source link

unmatched BacDive taxa vs NCBITaxon #87

Open realmarcin opened 6 months ago

realmarcin commented 6 months ago

Regarding BacDive-to-media edges, a total of 32513 edges for NCBITaxon -> medium are ingested from BacDive.

However, there are 1609 unmatched BacDive taxa (vs NCBITaxon). In addition there are 257 bacdive:None media associations and this is some odd ingest artefact, perhaps a python case that needs to be caught. It is likely that the current ingest does not do NER against NCBITaxon but just uses BacDive NCBITaxon field when available.

Bacdive ids failed NCBITaxon NER = 1609 urn:uuid:77bd5911-8a5c-470e-9f09-d419ab11b6c2 bacdive:8540 biolink:occurs_in mediadive.medium:645 BAO:0002924 Graph bacdive:8540

bacdive:None = 257 urn:uuid:e12a5c72-ee3f-4bc6-9c79-ee4ff1b9f410 bacdive:None biolink:occurs_in mediadive.medium:C66 BAO:0002924 Graph bacdive:None

The CHEBI manual annotation file for unmatched cases is - all cases involved lack of a synonym. Here: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/feba/kg_microbe/transform_utils/traits/chebi_manual_annotation.tsv

Example GO unmatched term files are attached.

go_ner_unmatched.txt