Knowledge-Graph-Hub / kg-obo

A package to transform all OBO ontologies into KGX TSV format and OBO json, and put the transformed graph in KGhub
https://knowledge-graph-hub.github.io/kg-obo/getting_started.html
GNU General Public License v3.0
28 stars 2 forks source link

Error in stats generation: duplicate node IDs in `bco` #202

Open caufieldjh opened 1 year ago

caufieldjh commented 1 year ago

Describe the bug

During stats generation, this error is raised:

03:16:33  Downloading apollo_sv, version 2022-11-25 from KG-OBO: kg-obo/apollo_sv/2022-11-25/apollo_sv_kgx_tsv.tar.gz
03:16:33  Downloading apollo_sv, version v2023-01-10 from KG-OBO: kg-obo/apollo_sv/v2023-01-10/apollo_sv_kgx_tsv.tar.gz
03:16:33  Downloading apollo_sv, version v4.1.1. from KG-OBO: kg-obo/apollo_sv/v4.1.1./graph.tar.gz
03:16:33  Downloading aro, version 05-10-2021-09-37 from KG-OBO: kg-obo/aro/05-10-2021-09-37/aro_kgx_tsv.tar.gz
03:16:33  Downloading aro, version 12-09-2022-11-38 from KG-OBO: kg-obo/aro/12-09-2022-11-38/aro_kgx_tsv.tar.gz
03:16:33  Downloading bco, version 2020-03-27 from KG-OBO: kg-obo/bco/2020-03-27/bco_kgx_tsv.tar.gz
03:16:33  Downloading bco, version 2021-11-14 from KG-OBO: kg-obo/bco/2021-11-14/bco_kgx_tsv.tar.gz
03:16:33  Encountered unresolvable error while generating stats: <class 'ValueError'> - Duplicated values found while building the vocabulary!
03:16:33  Specifically the duplicated values are:
03:16:33  ["BCO:0000003", "BCO:0000016", "BCO:0000025", "BCO:0000031", "BCO:0000032", "BCO:0000042", "BCO:0000044", "BCO:0000046", "BCO:0000075", "BCO:0000080", "BCO:0000081"].
03:16:33  The number of duplicates found is 11, as the length of the reverse map is 716 and the length of the map is 705.

This causes the stats file to not be generated.

Version

71001b2c9d6e3ae9ba082e42911655055dae785b

Additional context

It's an issue with loading the KGX TSV into grape, but the source of the issue is unclear. This may be due to some other node getting renamed to have the proper BCO: CURIE prefix, as that would leave a duplicate in the nodelist.