Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
78 stars 26 forks source link

Fix pyshex issue run jenkins #377

Closed justaddcoffee closed 3 years ago

justaddcoffee commented 3 years ago

Fixes a build issue seemingly caused by problematic pyshex version 0.7.15 - I'm pinning this to 0.7.14 for now

18:54:45  + python3.7 run.py transform
18:54:46  /var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/biolinkml/__init__.py:158: UserWarning: Some URL processing will fail with python 3.7.5 or earlier.  Current version: sys.version_info(major=3, minor=7, micro=5, releaselevel='final', serial=0)
18:54:46    warn(f"Some URL processing will fail with python 3.7.5 or earlier.  Current version: {sys.version_info}")
18:54:46  Traceback (most recent call last):
18:54:46    File "run.py", line 6, in <module>
18:54:46      from kg_covid_19 import download as kg_download
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/__init__.py", line 2, in <module>
18:54:46      from .transform import transform
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/transform.py", line 8, in <module>
18:54:46      from kg_covid_19.transform_utils.gocam_transform.gocam_transform import GocamTransform
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/transform_utils/gocam_transform/__init__.py", line 1, in <module>
18:54:46      from .gocam_transform import GocamTransform
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/kg_covid_19/transform_utils/gocam_transform/gocam_transform.py", line 6, in <module>
18:54:46      from kgx import RdfTransformer, PandasTransformer # type: ignore
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/kgx/__init__.py", line 1, in <module>
18:54:46      from kgx.transformers.pandas_transformer import PandasTransformer
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/kgx/transformers/pandas_transformer.py", line 11, in <module>
18:54:46      from kgx.utils.kgx_utils import generate_edge_key
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/kgx/utils/kgx_utils.py", line 7, in <module>
18:54:46      from biolinkml.meta import TypeDefinitionName, ElementName, SlotDefinition, ClassDefinition, TypeDefinition, Element
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/biolinkml/meta.py", line 21, in <module>
18:54:46      from biolinkml.utils.formatutils import camelcase, underscore, sfx
18:54:46    File "/var/lib/jenkins/workspace/dge-graph-hub_kg-covid-19_master/gitrepo/venv/lib/python3.7/site-packages/biolinkml/utils/formatutils.py", line 4, in <module>
18:54:46      from pyshex.shex_evaluator import EvaluationResult
18:54:46  ModuleNotFoundError: No module named 'pyshex'
justaddcoffee commented 3 years ago

Still failing, but a little further into the transform step (during the go-json transform):

12:50:01  Parsing data/raw/go-plus.json
12:50:01  [KGX][json_transformer.py][               parse] INFO: Parsing data/raw/go-plus.json
12:50:01  [KGX][json_transformer.py][          load_nodes] INFO: Loading 81389 nodes into networkx.MultiDiGraph
12:50:13  [KGX][json_transformer.py][          load_edges] INFO: Loading 170422 edges into networkx.MultiDiGraph
12:50:13  Traceback (most recent call last):
12:50:13    File "run.py", line 165, in <module>
12:50:13      cli()
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
12:50:13      return self.main(*args, **kwargs)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
12:50:13      rv = self.invoke(ctx)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
12:50:13      return _process_result(sub_ctx.command.invoke(sub_ctx))
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
12:50:13      return ctx.invoke(self.callback, **ctx.params)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
12:50:13      return callback(*args, **kwargs)
12:50:13    File "run.py", line 64, in transform
12:50:13      kg_transform(*args, **kwargs)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/kg_covid_19/transform.py", line 62, in transform
12:50:13      t.run(ONTOLOGIES[source])
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/kg_covid_19/transform_utils/ontology/ontology_transform.py", line 36, in run
12:50:13      self.parse(k, data_file, k)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/kg_covid_19/transform_utils/ontology/ontology_transform.py", line 59, in parse
12:50:13      transformer.parse(data_file, compression=compression, provided_by=source)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/kgx/transformers/json_transformer.py", line 228, in parse
12:50:13      self.load(obj['graphs'][0])
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/kgx/transformers/json_transformer.py", line 80, in load
12:50:13      self.load_edges(obj['edges'])
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/kgx/transformers/json_transformer.py", line 108, in load_edges
12:50:13      self.load_edge(edge)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/kgx/transformers/json_transformer.py", line 296, in load_edge
12:50:13      element = self.toolkit.get_element(mapping)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/bmt/toolkit.py", line 369, in get_element
12:50:13      parsed_name = parse_name(name)
12:50:13    File "/var/lib/jenkins/workspace/_19_fix_pyshex_issue_run_jenkins/gitrepo/venv/lib/python3.7/site-packages/bmt/utils.py", line 123, in parse_name
12:50:13      if name.startswith("biolink"):
12:50:13  AttributeError: 'NoneType' object has no attribute 'startswith

@deepakunni3 thoughts?

deepakunni3 commented 3 years ago

This bug was fixed in KGX 0.2.4 but since KG-COVID-19 is relying on a fork of KGX, those changes never made it there. I'll patch the fork for now

justaddcoffee commented 3 years ago

Thanks @deepakunni3!

justaddcoffee commented 3 years ago

@deepakunni3 could you ping me after you've updated the forked KGX? (No rush on this though)

deepakunni3 commented 3 years ago

I updated the fork to use an earlier version of bmt which doesn't manifest that same issue.

At some point next week, lets ditch the fork completely and use a stable KGX version (this will require some rewrites to our merge YAML).

justaddcoffee commented 3 years ago

At some point next week, lets ditch the fork completely and use a stable KGX version (this will require some rewrites to our merge YAML).

Thanks @deepakunni3! Glad to help us move to the stable KGX version too