Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

Two random data bugs #391

Closed justaddcoffee closed 3 years ago

justaddcoffee commented 3 years ago

Describe the bug

Two data bugs, should be fixed, but adding these tickets so we can check after build is finished:

SARS-CoV-2 gene annot ingest has some misplaced biolink categories in the nodes.tsv:

wget https://kg-hub.berkeleybop.io/kg-covid-19/current/transformed/sars_cov_2_gene_annot/edges.tsv
cut -f3 nodes.tsv | sort | uniq

biolink:Protein
category

(Notice blank category.) Should be fixed by #390

Also in the ChEMBL ingest, NCBITaxon:2697049 is in the edges.tsv but not the nodes.tsv. (Not necessarily a data bug, since this node exists elsewhere in another subgraph, but this causes errors when ChEMBL subgraph is ingested by itself in ensmallen.)

wget https://kg-hub.berkeleybop.io/kg-covid-19/current/transformed/ChEMBL/nodes.tsv
wget https://kg-hub.berkeleybop.io/kg-covid-19/current/transformed/ChEMBL/edges.tsv
grep -il NCBITaxon:2697049 *tsv
edges.tsv

Should be addressed by #389

To Reproduce

see above

Expected behavior

see above

Version

https://kg-hub.berkeleybop.io/kg-covid-19/20210101/index.html

justaddcoffee commented 3 years ago

Fixed by #389 #390