biolink / kgx

KGX is a Python library for exchanging Knowledge Graphs
https://kgx.readthedocs.io
BSD 3-Clause "New" or "Revised" License
116 stars 27 forks source link

CLI throws KeyError: 'OBAN' during kgx graph-summary operation #212

Closed nicholsn closed 4 years ago

nicholsn commented 4 years ago

Describe the bug When calling the cli to get a graph-summary of of the ensembl.ttl file (or others) from monarch, it throws a KeyError: 'OBAN'

To Reproduce

wget https://data.monarchinitiative.org/ttl/ensembl.ttl
kgx graph-summary --input-format ttl --output foo.txt ensembl.ttl
Traceback (most recent call last):
  File "/opt/miniconda3/envs/kgx/bin/kgx", line 8, in <module>
    sys.exit(cli())
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/kgx/cli/__init__.py", line 51, in graph_summary_wrapper
    graph_summary(inputs, input_format, input_compression, output)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 92, in graph_summary
    transformer = get_transformer(input_format)()
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/kgx/transformers/rdf_transformer.py", line 37, in __init__
    super().__init__(source_graph, curie_map)
  File "/opt/miniconda3/envs/kgx/lib/python3.8/site-packages/kgx/transformers/rdf_graph_mixin.py", line 45, in __init__
    self.OBAN = Namespace(self.prefix_manager.prefix_map['OBAN'])
KeyError: 'OBAN'

Expected behavior

kgx graph-summary --input-format ttl --output foo.txt ensembl.ttl writes out a summary to foo.txt

deepakunni3 commented 4 years ago

@nicholsn Thank you for reporting this bug.

I am trying to reproduce this issue but I haven't encountered this bug yet. Could you provide more context regarding how you have KGX set up?

I see you re using python3.8 via miniconda.

I'll try the same on my end and see if I encounter a similar behavior.

nicholsn commented 4 years ago

Thanks for the response, @deepakunni3! Happy to provide more details.

I am on OS X and originally setup kgx w/python 3.8 using pip install -e ., but I am seeing a different error when installing exactly as the instructions describe w/virtualenv (tried this in a fresh ubuntu container as well w/same error):

git clone https://github.com/NCATS-Tangerine/kgx.git
cd kgx
python3 -m venv venv
source venv/bin/activate
python3 --version
Python 3.7.7
pip3 install wheel
python3 setup.py install
Traceback (most recent call last):
  File "setup.py", line 40, in <module>
    'console_scripts': ['kgx=kgx.cli:cli']
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/opt/miniconda3/lib/python3.7/distutils/core.py", line 121, in setup
    dist.parse_config_files()
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/dist.py", line 700, in parse_config_files
    ignore_option_errors=ignore_option_errors)
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 120, in parse_configuration
    meta.parse()
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 425, in parse
    section_parser_method(section_options)
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 398, in parse_section
    self[name] = value
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 183, in __setitem__
    value = parser(value)
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 513, in _parse_version
    version = self._parse_attr(value, self.package_dir)
  File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 348, in _parse_attr
    module = import_module(module_name)
  File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/nnichols/Code/kgx/kgx/__init__.py", line 1, in <module>
    from kgx.transformers.pandas_transformer import PandasTransformer
  File "/Users/nnichols/Code/kgx/kgx/transformers/pandas_transformer.py", line 4, in <module>
    import networkx
ModuleNotFoundError: No module named 'networkx'

Following the conda approach using python 3.7, I was able to complete the installation but got the same error as originally reported KeyError: 'OBAN'.

But there is hope! It turns out that if I use the absolute path for /path/to/ensembl.ttl it parsed and generated the report 🤦

This only works when running kgx from within the git repo. If my current directory is outside of the git repo, I get the same key error as before.

Hope that helps w/debugging efforts!

deepakunni3 commented 4 years ago

Thank you for the detailed response. Looking into this now

deepakunni3 commented 4 years ago

Okay, I fixed the issue with the install and these fixes are on master.

Still looking into the following,

nicholsn commented 4 years ago

Thanks for the support @deepakunni3.

It looks like the KeyError: 'OBAN' might be caused by a caching issue and the inside/outside the git repo may be a red herring.

When I inspected the KeyError using ipdb, my prefix_manager.prefix_map has a few namespaces unrelated to biolink from my own ontologies.

I'm not sure what is going on there yet, but I suspect that rdflib (or maybe kgx?) is doing some caching of namespaces.

deepakunni3 commented 4 years ago

@nicholsn That is strange. KGX does cache certain lookups via cachetools but still unclear on how that interferes with prefix_manager in KGX.

Do you already have a modified version of prefixcommons-py installed?

nicholsn commented 4 years ago

I don't have a modified version of prefixcommons installed. I tried installing in a fresh ubuntu docker container and it works fine, but something is up on my OS X environment that is polluting the prefix_map and I haven't been able to track down even after deleting kgx and dependencies, creating new venv and conda environments, etc. and installing kgx - all lead to the same error...

I'm not sure if this helps at all, but here is what is stored in the prefix_map when the key error is thrown. Somehow it is picking up on the 'maze' prefix and somehow setting that as the biolink URI. These are just placeholders I had in some test data that I ran kgx validate on.

{'maze': 'http://id.mazetx.com/terms/', 'id': '@id', 'type': '@type', 'biolink': 'http://id.mazetx.com/terms/', 'MONARCH': 'https://monarchinitiative.org/', 'MONARCH_NODE': 'https://monarchinitiative.org/MONARCH_', '': 'https://www.example.org/UNKNOWN/'}

Any thoughts on where the caching might be stored so I can nuke it?

deepakunni3 commented 4 years ago

@nicholsn Sorry for the late response here.

I am unclear what might be contributing to the polluted prefix map. Thanks for sharing the snippet of prefix map.

The caching mechanism used in KGX doesn't write the cache to a file. It caches in memory at run time.

If it helps, there is a Docker container for KGX available at https://hub.docker.com/r/biolink/kgx

While not ideal, you can be guaranteed a sandbox for running KGX on files located on the host machine.

deepakunni3 commented 4 years ago

Oh, wait. I think I might know what is going on here. Did you have {'@vocab': 'http://id.mazetx.com/terms/'} in your prefix map before running KGX?

nicholsn commented 4 years ago

It might depend on what you mean by "prefix map", it was definitely in the @context section of a json-ld file, but it is possible that I added it somewhere else.

deepakunni3 commented 4 years ago

This is just a wild guess. So I could be wrong here.

KGX relies on https://github.com/biolink/biolink-model/blob/master/context.jsonld for prefix to IRI mappings, which has @vocab defined as https://w3id.org/biolink/vocab/. Clearly there is a clash happening somewhere that is affecting the JSON LD context and somehow @vocab is being overwritten with the mazetx IRI, which then KGX ends up using. Which might explain the failures you saw earlier and why I couldn't reproduce the same error.

nicholsn commented 4 years ago

ok, so I'm not sure exactly what changed since I opened this issue, but I did clear out a local version of biolink-model along the way, and now it seems to be working with a fresh pip install of kgx.

I'll go ahead and close this... Thanks for taking the time to work through this.

deepakunni3 commented 4 years ago

Okay, glad to hear that its working now. 👍