BiomedSciAI / fuse-drug

FuseMedML based molecular biochemistry library for drug discovery/repurposing
Apache License 2.0
14 stars 5 forks source link

loading pretrained tokenizers returns exception when installing package from git #97

Open sivanravidos opened 9 months ago

sivanravidos commented 9 months ago

Describe the bug When installing with pip install from git, the compiled tokenizers are not installed

To reproduce

  1. install from git: pip install git+https://github.com/BiomedSciAI/fuse-med-ml.git pip install git+https://github.com/BiomedSciAI/fuse-drug.git

  2. Try to use a pre-trained tokenizer:

    import os
    from fusedrug.data.tokenizer.ops import FastModularTokenizer
    from fusedrug.data.tokenizer.modulartokenizer import pretrained_tokenizers
    tokenizer_path = os.path.join(pretrained_tokenizers.get_dir_path(), 'modular_AA_SMILES_genes_single_path')
    tokenizer_op = FastModularTokenizer(tokenizer_path=tokenizer_path)

    results in exception

    
    >> FastModularTokenizer(tokenizer_path=tokenizer_path)
    Traceback (most recent call last):
    File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
    loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
    File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'
    Traceback (most recent call last):
    File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
    loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
    File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/ops/modular_tokenizer_ops.py", line 47, in init self._tokenizer = Tokenizer.from_file(self._tokenizer_path) File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 1545, in from_file return ModularTokenizer.load(path) File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 445, in load raise Exception(f"couldn't load config.yaml from {path}") Exception: couldn't load config.yaml from /dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path



**Expected behavior**\
tokenizers files should be added to `setup.py` so they are installed together with the python files
mmdanziger commented 9 months ago

This looks like a package data issue. The real question here is how is CI passing??