Closed zhoubay closed 1 year ago
Could you provide the running script? The downstream task does not require masking, so there is no [MASK]
item in the dictionary.
Could you provide the running script? The downstream task does not require masking, so there is no
[MASK]
item in the dictionary.
The running script is like this:
dictionary = Dictionary.load(os.path.join("Uni-Mol/notebooks/results", "dict.txt"))
mask_idx = dictionary.add_symbol("[MASK]", is_special=True)
model=UniMolModel(dictionary)
model_dict = torch.load("Uni-Mol/ckpt_model/mol_pre_no_h_220816.pt")
model.load_state_dict(model_dict["model"], strict=False)
If mask_idx = dictionary.add_symbol("[MASK]", is_special=True)
is removed, then the Exception will emerge, since the nn.Embedding
was designed for 31 tokens instead of 30.
Got it, and which task is associated with this issue? [mol/pocket pretrain; mol/pocket property prediction; conf gen; binding pose prediction; binding pose demo; mol repr demo]
Moreover, if a running script like this is provided, it could help us to reproduce the problem and fix it.
data_path="./conformation_generation" # replace to your data path
results_path="./infer_confgen" # replace to your results path
weight_path="./save_confgen/checkpoint_best.pt" # replace to your ckpt path
batch_size=128
task_name="qm9" # or "drugs", conformation generation task name
recycles=4
python ./unimol/infer.py --user-dir ./unimol $data_path --task-name $task_name --valid-subset test \
--results-path $results_path \
--num-workers 8 --ddp-backend=c10d --batch-size $batch_size \
--task mol_confG --loss mol_confG --arch mol_confG \
--num-recycles $recycles \
--path $weight_path \
--fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
--log-interval 50 --log-format simple
well, actually I'm trying to use your pretrained weights to do other tasks, so I didn't dig so much deep into your Uni-Core framework, which I think is a remarkable work.
About this issue, I've added a [MASK]
token to dict.txt
and there's no difference.
btw, instead of reading source code directly, are there any resources to learn your framework?
btw, instead of reading source code directly, are there any resources to learn your framework?
Hope this helps. https://github.com/dptech-corp/Uni-Core#acknowledgement
Hi there,
I'm trying to load pretrained weights of
molecular pretrain
(https://github.com/dptech-corp/Uni-Mol/releases/download/v0.1/mol_pre_no_h_220816.pt), but using theexample_data/molecule/dict.txt
leads to an Exception below.I got the cause of this Exception, since adding a line code in the
unimol/infer.py
self.mask_idx = dictionary.add_symbol("[MASK]", is_special=True)
solves this problem. (https://github.com/dptech-corp/Uni-Mol/blob/27ad2a0dbfafc9795b36efb279d7ed7c6d87a34a/unimol/tasks/unimol.py#L122)My question is, why don't you just add a line of
[MASK]
todict.txt
to solve this problem?My point is, whenever we use the checkpoints you offer, this code is irrelevant to coding but necessary for running.
What's your concern about this?