HelloJocelynLu / t5chem

Transformer-based model for chemical reactions
MIT License
58 stars 14 forks source link

Cannot find `tokenizer.json` in USPTO_500_MT model weights #18

Closed super-dainiu closed 1 week ago

super-dainiu commented 2 weeks ago

Traceback (most recent call last): File "/home/ys792/.conda/envs/t5chem/bin/t5chem", line 8, in sys.exit(main()) File "/home/ys792/.conda/envs/t5chem/lib/python3.8/site-packages/t5chem/main.py", line 33, in main command(args) File "/home/ys792/.conda/envs/t5chem/lib/python3.8/site-packages/t5chem/run_prediction.py", line 75, in predict tokenizer = PreTrainedTokenizerFast(tokenizer_file=os.path.join(args.model_dir, 'tokenizer.json'), **TOKENS) File "/home/ys792/.conda/envs/t5chem/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 107, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: No such file or directory (os error 2)

HelloJocelynLu commented 2 weeks ago

Hi, Thank you for the report. Can you please provide details on how you installed t5chem? After investigating run_prediction.py, it appears that the source code referenced in your traceback differs from the one in this repository. (I did not utilize PreTrainedTokenizerFast)

super-dainiu commented 2 weeks ago

I referred to this setup https://github.com/HelloJocelynLu/t5chem/issues/17#issuecomment-2106139448. The model itself doesn't contain tokenizer.json, does it?

HelloJocelynLu commented 2 weeks ago

I referred to this setup #17 (comment). The model itself doesn't contain tokenizer.json, does it?

I see. It appears you are using tkella47's t5chem version. I will let him know about this! Meanwhile, are you facing any similar problems with the t5chem implementation in this repository?

super-dainiu commented 2 weeks ago

Not really. This repository seems to have many version mismatches, so I don't think I reached the stage with tokenizer.json.

If possible, I would like to help with the update and implementation!

HelloJocelynLu commented 1 week ago

Not really. This repository seems to have many version mismatches, so I don't think I reached the stage with tokenizer.json.

If possible, I would like to help with the update and implementation!

It would be so kind of you to assist me in updating it! Sorry I don't have much bandwidth to maintain it currently :(. I believe the main issue with package conflicts arises from the huggingface and torchtext packages. In a recent version, they introduced some backward incompatibility.

I tested the docker image today (built from this repo), and it works, just copy/paste Repository overview section commands: https://hub.docker.com/repository/docker/hellojocelynlu/t5chem/general

For the tkella47's t5chem version tokenizer.json, I've pinged him and he is looking into it.

tkella47 commented 1 week ago

Fixed!

HelloJocelynLu commented 1 week ago

Fixed!

Thank you tkella47!

Hi super-dainiu,

This issue should be resolved. I am closing it for now. Please let me know if the problem persists and needs to be reopened.

super-dainiu commented 1 week ago

Thank you so much!