Closed DanielSchauerTakeda closed 1 year ago
Dear Daniel Schauer,
Thank you very much for expressing interest in the KAZU framework!!
The "cache" feature in "functools" was introduced in Python 3.9 and above (https://docs.python.org/3.9/library/functools.html#functools.cache). An ImportError may occur if the python version is below 3.9.
Would you try once again with a newer version of python and environment?
I apologize for the missing python version information in the Readme. I will promptly request an update to include this information. I will also discuss whether we can use alternative functions that are compatible with lower versions.
Best regards, WonJin
@wonjininfo thanks for the reply and following up. I tried again, this time specifying python=3.9
when creating my anaconda environment but I got an error because there is no OS environment variable named JAVA_HOME
on my computer.
Since I'm working in an enterprise environment, I will need to reach out to my IT desk to get that setup.
I'd think that the quick start installation instructions for Kazu should either call out the need for this OS environment variable, or alternatively if the underlying problem is that Kazu expects Java SDK to be installed.
I can run code like this:
from hydra import initialize_config_dir, compose
from hydra.utils import instantiate
from kazu.data.data import Document
from pathlib import Path
cdir = Path("C:/nlp-spacy-prodigy/KAZU/kazu_model_pack_public-v0.0.16").joinpath('conf')
with initialize_config_dir(config_dir=str(cdir)):
cfg = compose(
config_name="config",
overrides=[],
)
text = "EGFR mutations are often implicated in lung cancer. Epidermal Growth Factor Receptor (EGFR) is a gene."
doc = Document.create_simple_document(text)
print(f"{doc.sections[0].text}")
#>>EGFR mutations are often implicated in lung cancer. Epidermal Growth Factor Receptor (EGFR) is a gene.
but any code that instantiates a kazu pipeline
throw the error mentioned before:
from hydra import initialize_config_dir, compose
from hydra.utils import instantiate
from kazu.data.data import Document
from kazu.pipeline import Pipeline
from pathlib import Path
# the hydra config is kept in the model pack
# get the model pack from kazu's release page https://github.com/astrazeneca/kazu/releases, then unzip to the working folder
cdir = Path("C:/nlp-spacy-prodigy/KAZU/kazu_model_pack_public-v0.0.16").joinpath('conf')
print(cdir)
with initialize_config_dir(config_dir=str(cdir)):
cfg = compose(
config_name="config",
overrides=[],
)
pipeline: Pipeline = instantiate(cfg.Pipeline)
text = "EGFR mutations are often implicated in lung cancer"
doc = Document.create_simple_document(text)
pipeline([doc])
print(f"{doc.sections[0].text}")
Hi Daniel,
Thanks for your patience here - yes, we should be calling out that the default pipeline expects a Java SDK installation. Sorry for that, and we'll work on the best way to do that.
Actually, it's only running a single 'step' in the pipeline, the 'SethStep' which recognises Gene Mutations, that depends on Java. So removing this step from the pipeline should let you try out the rest of Kazu. Inserting the line del cfg.Pipeline.steps[5]
after the config is created but before the pipeline is loaded will do this:
from hydra import initialize_config_dir, compose
from hydra.utils import instantiate
from kazu.data.data import Document
from kazu.pipeline import Pipeline
from pathlib import Path
import os
# the hydra config is kept in the model pack
cdir = Path(os.environ["KAZU_MODEL_PACK"]).joinpath('conf')
with initialize_config_dir(config_dir=str(cdir)):
cfg = compose(
config_name="config",
overrides=[],
)
### NEW KEY LINE ###
del cfg.Pipeline.steps[5]
pipeline: Pipeline = instantiate(cfg.Pipeline)
text = "EGFR mutations are often implicated in lung cancer"
doc = Document.create_simple_document(text)
pipeline([doc])
print(f"{doc.get_entities()}")
Try running this instead of the code in the quickstart - I've tried it (with JAVA_HOME not set) and it works for me. I've also checked there weren't any other KAZU-related environment variables when running it, and checked the environment variables we use elsewhere in the config for KAZU.
Hi Daniel,
How are you getting on? Did the workaround above work for you?
We've also just released v0.0.24, which removes the 'SethStep' from the default pipeline, so you will now be able to use it without having JAVA_HOME set.
We've also added the requirement of python 3.9 in the installation instructions as well as the project metadata in pyproject.toml (which shows up in the ‘Meta’ section on pypi.org). Thanks again for letting us know about these issues and sorry for the pain you suffered with them.
Closing this issue because we've updated Kazu to address the two key problems:
Please do re-open/open new issues though if you suffer from other problems though!
I was reading through the Quickstart documentation's Processing Your First Document section, but I ran into an issue.
setup steps from windows' command prompt:
Attempting to run a modified version of the sample script (due to not having admin rights to set an environment variable), does not work:
results from that script: