YerevaNN / BARTSmiles

BARTSmiles, generative masked language model for molecular representations
MIT License
31 stars 4 forks source link

Preprocess dataset doesn't work because the paths aren't correct #7

Open kosonocky opened 1 year ago

kosonocky commented 1 year ago

Hi, I followed the instructions for preprocessing the datasets as per the readme. However, I get errors.

The --root command does work at first to specify the path that is used in the process_datasets.py file, but anything that is called afterwards from that file does not reference the root I provided. It defaults to the author's paths and so we cannot use them without modifying the default args to match our filesystem.

Seems to be an issue with the following paths:

--model /home/gayane/BartLM/chemical/tokenizer/chem.model

'/home/gayane/BartLM/BARTSmiles/preprocess/spm_parallel.py': [Errno 2] No such file or directory

srcdict='/home/gayane/BartLM/chemical/tokenizer/chem.vocab.fs'

kosonocky commented 1 year ago

Actually, I'm starting to think that the filesystem on GitHub isn't the same directory layout as you have in the code. Unless I am mistaken?

In compute_score.py:

Line 20: parser.add_argument('--root', default="/home/gayane/BartLM",

42: root = args.root

44: sys.path.append(f"{root}/BARTSmiles/utils/")

49: store_path = f"{root}/chemical/checkpoints/evaluation_data"

Here we take in the root path. Then change the sys path to root/BARTSmiles/utils/, which makes sense so we can access the utils module. So, it seems we don't want 'BARTSmiles' in the root path because it is being appended on line 44.

But then in line 49, defining store_path, you reference {root}/chemical/..., but chemical is inside of BARTSmiles (as is the root layout in the repo). So whatever path we choose we either get an error on line 44 or on line 49.

Could you please resolve this, or let me know if I am doing something wrong?

kosonocky commented 1 year ago

In my repo I changed any mention of "{root}/chemical" to "{root}/BARTSmiles/chemical", and made sure my --root was the directory in which the BARTSmiles folder was stored, and everything ran fine.

But as is posted this will not run to my knowledge.

zw-SIMM commented 1 year ago

Same problem