Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
79 stars 26 forks source link

Merge fails due to error tokenizing data #469

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

Describe the bug

During the merge phase of the Jenkins build, this error occurs:

[2023-05-02T04:19:47.961Z] Traceback (most recent call last):
[2023-05-02T04:19:47.961Z]   File "run.py", line 202, in <module>
[2023-05-02T04:19:47.961Z]     cli()
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
[2023-05-02T04:19:47.961Z]     return self.main(*args, **kwargs)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
[2023-05-02T04:19:47.961Z]     rv = self.invoke(ctx)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
[2023-05-02T04:19:47.961Z]     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[2023-05-02T04:19:47.961Z]     return ctx.invoke(self.callback, **ctx.params)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
[2023-05-02T04:19:47.961Z]     return __callback(*args, **kwargs)
[2023-05-02T04:19:47.961Z]   File "run.py", line 94, in merge
[2023-05-02T04:19:47.961Z]     load_and_merge(yaml, processes)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/merge_utils/merge_kg.py", line 33, in load_and_merge
[2023-05-02T04:19:47.961Z]     merged_graph = merge(yaml_file, processes=processes)
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in merge
[2023-05-02T04:19:47.961Z]     stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z]   File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 658, in <listcomp>
[2023-05-02T04:19:47.961Z]     stores = [r.get() for r in results]
[2023-05-02T04:19:47.961Z]   File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get
[2023-05-02T04:19:47.961Z]     raise self._value
[2023-05-02T04:19:47.961Z] pandas.errors.ParserError: Error tokenizing data. C error: Expected 10 fields in line 144676, saw 15

The last input processed before this error is the go-cams, but that may not perfectly correlate with what the actual issue is. Either way, something's not parsing as expected and it's breaking the merge.

caufieldjh commented 1 year ago

OK, this isn't the GO-CAMs because they don't get anywhere near line 144676 (or 10 fields, for that matter)

caufieldjh commented 1 year ago

Don't know if it's related, but TCRD also changed links again, so will fix in attached PR