Closed nicholsn closed 4 years ago
@nicholsn Thank you for reporting this bug.
I am trying to reproduce this issue but I haven't encountered this bug yet. Could you provide more context regarding how you have KGX set up?
I see you re using python3.8 via miniconda.
I'll try the same on my end and see if I encounter a similar behavior.
Thanks for the response, @deepakunni3! Happy to provide more details.
I am on OS X and originally setup kgx
w/python 3.8 using pip install -e .
, but I am seeing a different error when installing exactly as the instructions describe w/virtualenv (tried this in a fresh ubuntu container as well w/same error):
git clone https://github.com/NCATS-Tangerine/kgx.git
cd kgx
python3 -m venv venv
source venv/bin/activate
python3 --version
Python 3.7.7
pip3 install wheel
python3 setup.py install
Traceback (most recent call last):
File "setup.py", line 40, in <module>
'console_scripts': ['kgx=kgx.cli:cli']
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/opt/miniconda3/lib/python3.7/distutils/core.py", line 121, in setup
dist.parse_config_files()
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/dist.py", line 700, in parse_config_files
ignore_option_errors=ignore_option_errors)
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 120, in parse_configuration
meta.parse()
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 425, in parse
section_parser_method(section_options)
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 398, in parse_section
self[name] = value
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 183, in __setitem__
value = parser(value)
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 513, in _parse_version
version = self._parse_attr(value, self.package_dir)
File "/Users/nnichols/Code/kgx/venv/lib/python3.7/site-packages/setuptools/config.py", line 348, in _parse_attr
module = import_module(module_name)
File "/opt/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/nnichols/Code/kgx/kgx/__init__.py", line 1, in <module>
from kgx.transformers.pandas_transformer import PandasTransformer
File "/Users/nnichols/Code/kgx/kgx/transformers/pandas_transformer.py", line 4, in <module>
import networkx
ModuleNotFoundError: No module named 'networkx'
Following the conda approach using python 3.7, I was able to complete the installation but got the same error as originally reported KeyError: 'OBAN'
.
But there is hope! It turns out that if I use the absolute path for /path/to/ensembl.ttl
it parsed and generated the report 🤦
This only works when running kgx
from within the git repo. If my current directory is outside of the git repo, I get the same key error as before.
Hope that helps w/debugging efforts!
Thank you for the detailed response. Looking into this now
Okay, I fixed the issue with the install and these fixes are on master
.
Still looking into the following,
KeyError: 'OBAN'
Thanks for the support @deepakunni3.
It looks like the KeyError: 'OBAN'
might be caused by a caching issue and the inside/outside the git repo may be a red herring.
When I inspected the KeyError
using ipdb, my prefix_manager.prefix_map
has a few namespaces unrelated to biolink from my own ontologies.
I'm not sure what is going on there yet, but I suspect that rdflib (or maybe kgx?) is doing some caching of namespaces.
@nicholsn That is strange. KGX does cache certain lookups via cachetools
but still unclear on how that interferes with prefix_manager in KGX.
Do you already have a modified version of prefixcommons-py installed?
I don't have a modified version of prefixcommons installed. I tried installing in a fresh ubuntu docker container and it works fine, but something is up on my OS X environment that is polluting the prefix_map and I haven't been able to track down even after deleting kgx and dependencies, creating new venv and conda environments, etc. and installing kgx - all lead to the same error...
I'm not sure if this helps at all, but here is what is stored in the prefix_map when the key error is thrown. Somehow it is picking up on the 'maze' prefix and somehow setting that as the biolink URI. These are just placeholders I had in some test data that I ran kgx validate
on.
{'maze': 'http://id.mazetx.com/terms/', 'id': '@id', 'type': '@type', 'biolink': 'http://id.mazetx.com/terms/', 'MONARCH': 'https://monarchinitiative.org/', 'MONARCH_NODE': 'https://monarchinitiative.org/MONARCH_', '': 'https://www.example.org/UNKNOWN/'}
Any thoughts on where the caching might be stored so I can nuke it?
@nicholsn Sorry for the late response here.
I am unclear what might be contributing to the polluted prefix map. Thanks for sharing the snippet of prefix map.
The caching mechanism used in KGX doesn't write the cache to a file. It caches in memory at run time.
If it helps, there is a Docker container for KGX available at https://hub.docker.com/r/biolink/kgx
While not ideal, you can be guaranteed a sandbox for running KGX on files located on the host machine.
Oh, wait. I think I might know what is going on here.
Did you have {'@vocab': 'http://id.mazetx.com/terms/'}
in your prefix map before running KGX?
It might depend on what you mean by "prefix map", it was definitely in the @context
section of a json-ld file, but it is possible that I added it somewhere else.
This is just a wild guess. So I could be wrong here.
KGX relies on https://github.com/biolink/biolink-model/blob/master/context.jsonld for prefix to IRI mappings, which has @vocab
defined as https://w3id.org/biolink/vocab/
. Clearly there is a clash happening somewhere that is affecting the JSON LD context and somehow @vocab
is being overwritten with the mazetx IRI, which then KGX ends up using. Which might explain the failures you saw earlier and why I couldn't reproduce the same error.
ok, so I'm not sure exactly what changed since I opened this issue, but I did clear out a local version of biolink-model along the way, and now it seems to be working with a fresh pip install of kgx.
I'll go ahead and close this... Thanks for taking the time to work through this.
Okay, glad to hear that its working now. 👍
Describe the bug When calling the cli to get a graph-summary of of the
ensembl.ttl
file (or others) from monarch, it throws aKeyError: 'OBAN'
To Reproduce
Expected behavior
kgx graph-summary --input-format ttl --output foo.txt ensembl.ttl
writes out a summary tofoo.txt