What causes low accuracy in multitasking models?

U-T100 commented 2 years ago

I did a fine tuning of the pre-trained model with mixed data in USPTO_500_MT. I then used the tuned model to make predictions on the Reactants in USPTO_500_MT. The results showed that Top-1 is 19%, Top-2 is 25%, Top-3 is 29%... The accuracy was low. What is the reason why the results were not as accurate as described in the paper? The accuracy for the validation data was eventually raised to about 0.6.

HelloJocelynLu commented 2 years ago

Hi,

I would need to clarify that model USPTO_500_MT is a fully trained model and does not need to be finetuned (yes, you can direct launch it by running "t5chem predict ...."). If you are talking about training based on Character-level pubchem pretrained model on USPTO_500_MT mixed dataset, please note that the number of training epochs, learning rate and batch size can affect training results. After you get your trained model, make sure you passed "--prefix Reactants:" during inference time. That is because the model are trained in "mixed" manner, and would have difficulty in distinguishing which task type you want to carry out. (Given a molecule without specifying task type can be confusing, for example, it can either go through retrosynthesis or decomposition <- forward reaction) If you still have difficulty in figuring out the problem, please attach the command you have been used to train & test your model, and specify the pre-trained model and dataset used. Then I would have a more clear picture of what s happening here.

Best, Jocelyn

U-T100 commented 2 years ago

Thank you for replying. Sorry for the misunderstanding.

I'm talking about training based on Character-level pubchem pretrained model on USPTO_500_MT mixed dataset. The following code was executed during finetuning.

import subprocess

t5chem_path = '/home/user/miniconda3/envs/virtual_env/bin/t5chem'
data_dir = 'data/USPTO_500_MT/mixed/'
output_dir = "model/my_finetuned_sample_model/mixed_USPTO_500_MT/"
task_type = 'mixed'
pretrain = "model/public_pretrained_model/"
vocab = ''
tokenizer = ''
random_seed = 42
num_epock = 30
log_step = 100
batch_size = 64
init_learning_rate = ''

cmd = f'{t5chem_path} train --data_dir {data_dir} --output_dir {output_dir} --task_type {task_type} --pretrain {pretrain} --random_seed {random_seed} --num_epoch {num_epock} --log_step {log_step} --batch_size {batch_size}'
subprocess.run(cmd, shell=True)

And the following code was executed at predict.

import subprocess

t5chem_path = '/home/user/miniconda3/envs/virtual_env/bin/t5chem'
data_dir = 'data/USPTO_500_MT/Reactants/'
model_dir = "model/my_finetuned_sample_model/mixed_USPTO_500_MT/best_cp-152500/"
prediction = 'prediction_result/reactants.tsv'
prefix = 'Reactants'
num_beams = 10
num_preds = 5
batch_size = 32

cmd = f'{t5chem_path} predict --data_dir {data_dir} --model_dir {model_dir} --prediction {prediction} --prefix {prefix} --num_beams {num_beams} --num_preds {num_preds} --batch_size {batch_size}'
subprocess.run(cmd, shell=True)

I have attached image of accuracy and loss when finetuning was done, please check it.

finetuning

HelloJocelynLu commented 2 years ago

Hi,

Thank you for your information. One major issue I can tell is that the prefix should be "Reactants:" rather than "Reactants" (Yes, : is essential!) You can double check it by taking a look at input formats in data/USPTO_500_MT/mixed/train.source. The stem reason for that is during trining, we have a prefix "Reactants:" that is pre-defined and is treated as special token to distinguish different tasks. However, without ":", "Reactants" is an English word that cannot be found in vocabulary. Therefore, if we used "Reactants", T5Chem is likely to treat it as part of a molecule -> [R, e, ..., s], but the model we have trained never learns this pattern, and cannot decide which task it should perform. I hope I explained it clearly. Minor suggestion: It seems that the validation loss is still decreasing. Instead of loading "best checkpoint", you may also try to load final model (@ model/my_finetuned_sample_model/mixed_USPTO_500_MT/) or consider to train it for longer time (larger num_epock)

Please let me know if you have further questions.

Best, Jocelyn

U-T100 commented 2 years ago

I solved the problem by adding a colon to the prefix when making a prediction. Thank you very much! I have one more question that is unrelated to this issue: does best checkpoint store the model with the smallest loss for the evaluation in the previous steps?

HelloJocelynLu commented 2 years ago

Currently, t5chem selects the best checkpoint based on validation loss (see here). But now huggingface suppports early stopping (see here) with flexible patience and metrics settings. So feel free to change it to whatever metrics you like!

HelloJocelynLu / t5chem

What causes low accuracy in multitasking models? #12