HelloJocelynLu / t5chem

Transformer-based model for chemical reactions
MIT License
62 stars 15 forks source link

not able to run colab example #17

Closed narayanamayya closed 5 months ago

narayanamayya commented 10 months ago
  2 from transformers import T5ForConditionalGeneration

----> 3 from t5chem import SimpleTokenizer 4 model = T5ForConditionalGeneration.from_pretrained(model_path) 5 tokenizer = SimpleTokenizer(vocab_file='t5chem/models/USPTO_500_MT/vocab.pt')

ImportError: cannot import name 'SimpleTokenizer' from 't5chem' (unknown location)

  1. ################################### On local machine after installation, I'm getting below error when running the example code 42 pad_token=pad_token, 43 eos_token=eos_token, ... 63 @property 64 def vocab_size(self) -> int: ---> 65 return len(self.vocab)

AttributeError: 'SimpleTokenizer' object has no attribute 'vocab'

HelloJocelynLu commented 9 months ago

Hi, I think it might be due to some package incompatibility issues. Could you please provide:

  1. The Colab environment you are running
  2. Information about your local machine: a. System b. Package versions, especially transformers, torchtext, and rdkit
narayanamayya commented 9 months ago
  1. I'm trying to run 2nd colab example you have provided - "Use a pretrained model in python script Colab". For rdkit i did !pip install rdkit. Thank you. Untitled

  2. In ubuntu machine

    Name Version Build Channel

    _libgcc_mutex 0.1 main
    _openmp_mutex 5.1 1_gnu
    asttokens 2.4.1 pyhd8ed1ab_0 conda-forge bzip2 1.0.8 h7b6447c_0
    ca-certificates 2024.2.2 hbcca054_0 conda-forge certifi 2024.2.2 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi comm 0.2.1 pyhd8ed1ab_0 conda-forge debugpy 1.6.7 py312h6a678d5_0
    decorator 5.1.1 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge expat 2.5.0 h6a678d5_0
    filelock 3.13.1 pypi_0 pypi fsspec 2024.2.0 pypi_0 pypi huggingface-hub 0.20.3 pypi_0 pypi idna 3.6 pypi_0 pypi importlib-metadata 7.0.1 pyha770c72_0 conda-forge importlib_metadata 7.0.1 hd8ed1ab_0 conda-forge ipykernel 6.29.2 pyhd33586a_0 conda-forge ipython 8.21.0 pyh707e725_0 conda-forge jedi 0.19.1 pyhd8ed1ab_0 conda-forge jinja2 3.1.3 pypi_0 pypi jupyter_client 8.6.0 pyhd8ed1ab_0 conda-forge jupyter_core 5.5.0 py312h06a4308_0
    ld_impl_linux-64 2.38 h1181459_1
    libffi 3.4.4 h6a678d5_0
    libgcc-ng 11.2.0 h1234567_1
    libgomp 11.2.0 h1234567_1
    libsodium 1.0.18 h36c2ea0_1 conda-forge libstdcxx-ng 11.2.0 h1234567_1
    libuuid 1.41.5 h5eee18b_0
    markupsafe 2.1.5 pypi_0 pypi matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mpmath 1.3.0 pypi_0 pypi ncurses 6.4 h6a678d5_0
    nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge networkx 3.2.1 pypi_0 pypi numpy 1.26.4 pypi_0 pypi nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi nvidia-curand-cu12 10.3.2.106 pypi_0 pypi nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi nvidia-nccl-cu12 2.19.3 pypi_0 pypi nvidia-nvjitlink-cu12 12.3.101 pypi_0 pypi nvidia-nvtx-cu12 12.1.105 pypi_0 pypi openssl 3.0.13 h7f8727e_0
    packaging 23.2 pyhd8ed1ab_0 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge pexpect 4.9.0 pyhd8ed1ab_0 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 10.2.0 pypi_0 pypi pip 24.0 pypi_0 pypi platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.42 pyha770c72_0 conda-forge psutil 5.9.0 py312h5eee18b_0
    ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pygments 2.17.2 pyhd8ed1ab_0 conda-forge python 3.12.1 h996f2a0_0
    python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge pyyaml 6.0.1 pypi_0 pypi pyzmq 25.1.2 py312h6a678d5_0
    rdkit 2023.9.4 pypi_0 pypi readline 8.2 h5eee18b_0
    regex 2023.12.25 pypi_0 pypi requests 2.31.0 pypi_0 pypi safetensors 0.4.2 pypi_0 pypi setuptools 68.2.2 py312h06a4308_0
    six 1.16.0 pyh6c4a22f_0 conda-forge sqlite 3.41.2 h5eee18b_0
    stack_data 0.6.2 pyhd8ed1ab_0 conda-forge sympy 1.12 pypi_0 pypi t5chem 1.0.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0
    tokenizers 0.15.1 pypi_0 pypi torch 2.2.0 pypi_0 pypi torchdata 0.7.1 pypi_0 pypi torchtext 0.16.2 pypi_0 pypi tornado 6.3.3 py312h5eee18b_0
    tqdm 4.66.1 pypi_0 pypi traitlets 5.14.1 pyhd8ed1ab_0 conda-forge transformers 4.37.2 pypi_0 pypi triton 2.2.0 pypi_0 pypi typing_extensions 4.9.0 pyha770c72_0 conda-forge tzdata 2023d h04d1e81_0
    urllib3 2.2.0 pypi_0 pypi wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge wheel 0.41.2 py312h06a4308_0
    xz 5.4.5 h5eee18b_0
    zeromq 4.3.5 h6a678d5_0
    zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 h5eee18b_0

HelloJocelynLu commented 9 months ago

Hi,

For the first collaboration issue, I'll address the installation problems when I have time. Thank you for reporting this. In the meantime, would you consider trying the Docker container? It's convenient and contains all the necessary dependencies.

Regarding the second issue, it appears that your transformer package and torchtext package are not compatible. These two packages have introduced some backward incompatibilities (see here). You may need to install the correct version to ensure the model works properly.

I suggest trying the docker image, as all the packages are pre-installed. It's been a while since the publication of this model, so the dependencies are somewhat out-of-date ;P. I would greatly appreciate any pull requests that help address these compatibility issues.

narayanamayya commented 9 months ago

Not able to run docker image. Giving the below error. Docker image is trying to fetch models from the Hugging Face repository Thanks

404 Client Error: Not Found for url: https://huggingface.co//work/models/pretrain/simple//resolve/main/config.json Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/transformers-4.10.2-py3.8.egg/transformers/configuration_utils.py", line 524, in get_config_dict resolved_config_file = cached_path(

HelloJocelynLu commented 9 months ago

Hi, I think huggingface is trying to load a model from online resources. However, the model you used here is not available online. I think it is because I did not deploy it on Hugging Face repository... The pretrained simple model (as well as other models) and datasets need to be downloaded seperately. For step-by-step directions to run the model with Docker, please check the instruction here: https://hub.docker.com/repository/docker/hellojocelynlu/t5chem/general Please also make sure that your --data_dir and --pretrain are pointing to correct paths. Otherwise, huggingface will try to search the models online -- which is not a desired behavior.

PoloWitty commented 6 months ago

I managed to build the env from scratch and use develop branch to use it in python xxx.py style. The following is my process:

  1. setup env

    conda create -n t5chem python==3.8
    conda activate t5chem
    conda install mkl==2023.0
    conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
    conda install transformers==4.10.2
    conda install scikit-learn==0.24.1
    conda install scipy==1.6.0
    pip install rdkit==2023.9.5
    conda install tensorboard
    conda install pandas
    conda install -c pytorch torchtext
  2. setup t5chem code (use develop branch instead of main to get rid of import error as in #1

    git clone https://github.com/HelloJocelynLu/t5chem.git
    cd t5chem
    git checkout develop
    cd ..
  3. Now you should be able to use it like this:

    # show version
    python t5chem/t5chem/__main__.py -v
    # train
    python t5chem/t5chem/__main__.py train -h
    # predict
    python t5chem/t5chem/__main__.py predict -h
HelloJocelynLu commented 6 months ago

I managed to build the env from scratch and use develop branch to use it in python xxx.py style. The following is my process:

  1. setup env
conda create -n t5chem python==3.8
conda activate t5chem
conda install mkl==2023.0
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
conda install transformers==4.10.2
conda install scikit-learn==0.24.1
conda install scipy==1.6.0
pip install rdkit==2023.9.5
conda install tensorboard
conda install pandas
conda install -c pytorch torchtext
  1. setup t5chem code (use develop branch instead of main to get rid of import error as in import error #1
git clone https://github.com/HelloJocelynLu/t5chem.git
cd t5chem
git checkout develop
cd ..
  1. Now you should be able to use it like this:
# show version
python t5chem/t5chem/__main__.py -v
# train
python t5chem/t5chem/__main__.py train -h
# predict
python t5chem/t5chem/__main__.py predict -h

Thank you PoloWitty for the information!

Hi narayanamayya, Thomas recently assisted me in updating the t5chem codebase to ensure compatibility with the newer dependencies. Please feel free to test it out! https://github.com/tkella47/t5chem I have not personally tested it yet, but it is worth a try.

pip install git+https://github.com/tkella47/t5chem
HelloJocelynLu commented 5 months ago

Close the issue due to inactivity