Problem of the weights of text-and-chemistry-t5-base-augm

drugilsberg commented 1 year ago

Issue originally opened here: https://github.com/GT4SD/gt4sd-core/issues/229

I cannot reproduce the results of the paper through the weights on the huggingface repo:GT4SD/multitask-text-and-chemistry-t5-base-augm ,whose accuracy is supposed to 0.322,but I test it only to be 0.196.

From medicine-wave:

Same question, I cannot reproduce the paragraph2action results of the paper through the weights on the huggingface repo: https://huggingface.co/GT4SD/multitask-text-and-chemistry-t5-small-standard, whose BLEU score is supposed to 0.929, but I test it only to be 0.659. Does the model need to be further finetuned?

christofid commented 1 year ago

Hi, I checked the original results and I verified that are indeed correct. I am investigating right now if the issue is related only to the uploaded checkpoint to HF. I will let you know for that asap.

In the meantime some comments/questions:

I would suggest as a postprocesing step to replace any token of the output with \\\\ (something like this is enough output=output.replace("<unk>","\\\\").strip(). We have noticed that some times the model generates some unk tokens instead of \\ in this specific scenario and this simple postprocessing helps to fix it without create problems to other cases. Yet, this should help only the text2molecule case which is mentioned in GT4SD/gt4sd-core#229
Do you use beam search for the generation?

zw-SIMM commented 1 year ago

Hi, I checked the original results and I verified that are indeed correct. I am investigating right now if the issue is related only to the uploaded checkpoint to HF. I will let you know for that asap.

In the meantime some comments/questions:

I would suggest as a postprocesing step to replace any token of the output with \\\\ (something like this is enough output=output.replace("<unk>","\\\\").strip(). We have noticed that some times the model generates some unk tokens instead of \ in this specific scenario and this simple postprocessing helps to fix it without create problems to other cases. Yet, this should help only the text2molecule case which is mentioned in GT4SD/gt4sd-core#229

Do you use beam search for the generation?

Thanks for your reply!

I tested the paragraph2action task, it may not be related.
I followed the demo code to generate the output with beam number 10, but there is only one output-action generated for each paragraph. By the way, is the result related to the beam search number? Aren't we comparing the performance of the top1 output-action？

test_df = pd.read_csv("hand_annotated_test.csv")
test_paragraphs = list(test_df['paragraphs'])
test_actions_preds = []

for instance in tqdm(test_paragraphs):
    input_text = f"Which actions are described in the following paragraph: {instance}"

    text = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(input_ids=text["input_ids"], max_length=max_length, num_beams=num_beams)
    output = tokenizer.decode(output[0].cpu())

    output = output.split(tokenizer.eos_token)[0]
    output = output.replace(tokenizer.pad_token,"")
    output = output.strip()

    test_actions_preds.append(output)

Looking forward to the corrected checkpoint, tks!

christofid commented 12 months ago

I updated the demo code to include the needed post processing step for the description to smiles case. Thanks for pointing out this! The number of beams in beam search does affect the generated results as the more the number of beams the more the possible token combinations that are examined during generation. We compare top1 but the best output could vary if we use different number of beams. I will let you know for the checkpoint asap.

jannisborn commented 7 months ago

Closing due to inactivity

GT4SD / multitask_text_and_chemistry_t5

Problem of the weights of text-and-chemistry-t5-base-augm #3