Describe the bug
When installing with pip install from git, the compiled tokenizers are not installed
To reproduce
install from git:
pip install git+https://github.com/BiomedSciAI/fuse-med-ml.gitpip install git+https://github.com/BiomedSciAI/fuse-drug.git
Try to use a pre-trained tokenizer:
import os
from fusedrug.data.tokenizer.ops import FastModularTokenizer
from fusedrug.data.tokenizer.modulartokenizer import pretrained_tokenizers
tokenizer_path = os.path.join(pretrained_tokenizers.get_dir_path(), 'modular_AA_SMILES_genes_single_path')
tokenizer_op = FastModularTokenizer(tokenizer_path=tokenizer_path)
results in exception
>> FastModularTokenizer(tokenizer_path=tokenizer_path)
Traceback (most recent call last):
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'
Traceback (most recent call last):
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/ops/modular_tokenizer_ops.py", line 47, in init
self._tokenizer = Tokenizer.from_file(self._tokenizer_path)
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 1545, in from_file
return ModularTokenizer.load(path)
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 445, in load
raise Exception(f"couldn't load config.yaml from {path}")
Exception: couldn't load config.yaml from /dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path
**Expected behavior**\
tokenizers files should be added to `setup.py` so they are installed together with the python files
Describe the bug When installing with
pip install
from git, the compiled tokenizers are not installedTo reproduce
install from git:
pip install git+https://github.com/BiomedSciAI/fuse-med-ml.git
pip install git+https://github.com/BiomedSciAI/fuse-drug.git
Try to use a pre-trained tokenizer:
results in exception
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/ops/modular_tokenizer_ops.py", line 47, in init
self._tokenizer = Tokenizer.from_file(self._tokenizer_path)
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 1545, in from_file
return ModularTokenizer.load(path)
File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 445, in load
raise Exception(f"couldn't load config.yaml from {path}")
Exception: couldn't load config.yaml from /dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path