Problem on reproducing the sent-ctrl baseline

cs329yangzhong commented 1 year ago

Dear authors,

Thank you very much for sharing the codebase. However, I encountered several difficulties in reproducing the paper results.

Firstly, in your repo, you have the below code to reproduce sentctrl_baseline.

CUDA_VISIBLE_DEVICES=0 python ctrl_transformer.py --model_name_or_path facebook/bart-large-cnn --do_train --do_eval --do_predict --train_file data/control_clean/train_rate_concat_control.csv --validation_file data/control_clean/val_rate_concat_control.csv --test_file data/control_clean/test_rate_concat_control.csv --output_dir ./results/sentctrl_reproduced --seed 0 --save_total_limit 3 --gen_target_max 800 --gen_type beam_search --predict_with_generate --eval_steps 500 --max_source_length 2048,

should there be extra flags --with_added_tokens and --remove_prompts to incorporate the control code in the input and generate processed_result.txt? Otherwise, the above command only uses the text and summary, looking like an uncontrolled baseline.

Secondly, I tried to train two models with your commands, one uncontrolled and the other controlled. However, the prediction ROUGE scores are pretty low.

Modifications are that I enable gradient_checkpointing=True in lines 455-470 of ctrl_transformer.py due to GPU Memory limitation and adding <sep> <sent-sep> <label-sep> as additional special tokens to the tokenizer. The best checkpoint I get is at 10k steps with an eval loss of 2.2049572467803955. If this does not match yours, would you mind sharing the trained model's logs for comparison?

{'rouge1': 30.19310776236535, 'rouge2': 6.084907987277266, 'rougeL': 16.964281715622132}

Thank you very much!

cs329yangzhong commented 1 year ago

Update By changing the dataset to orginal_clean, surprisingly the model worked better {'rouge1': 38.30224803222259, 'rouge2': 10.992026631869788, 'rougeL': 23.257482859133077}:

CUDA_VISIBLE_DEVICES=0 python ctrl_transformer.py --model_name_or_path facebook/bart-large-cnn --do_train --do_eval --do_predict --train_file data/original_clean/train_rate_concat_sent-ctrl.csv --validation_file data/control_clean/val_rate_concat_sent-ctrl.csv --test_file data/control_clean/test_rate_concat_sent-ctrl.csv --output_dir ./results/sentctrl_reproduced --seed 0 --save_total_limit 3 --gen_target_max 800 --gen_type beam_search --predict_with_generate --eval_steps 500 --max_source_length 2048 --remove_prompts

Shen-Chenhui commented 1 year ago

Hi,

My apologies! I tried the data/control_clean for some other experiments and mixed it up. You are right that the correct data to use is data/original_clean. I will update my README.md.

Thanks so much for trying our code! Let me know if you require further assistance on this, if not I will be closing this issue.

Shen-Chenhui commented 1 year ago

Hi,

Please do not use the --with_added_tokens and --remove_prompts, nor adding <sep> <sent-sep> <label-sep> as it is not for this repo. I will clean this up a bit more. Following the given instruction and with default settings in the code is sufficient.

For uncontrolled settings, you may use the uncontrolled dataset provided in my other repo MReD (since SentBS focuses on controlled summarization only and does not reproduce uncontrolled results):
https://github.com/Shen-Chenhui/MReD/tree/master/summarization/abstractive/filtered_uncontrolled_data To your convenience, you may reuse the same code on reproducing sentctrl_baseline, but substitute the data with the above line provided data.

I didn't experiment with gradient_checkpointing=True but your training looks good as well. My loss is at 2.265.

Shen-Chenhui commented 1 year ago

Hi,

I have cleaned up my code. Sorry for the previous confusion. You may pull again from the repo to get the latest changes.

Shen-Chenhui commented 1 year ago

Cleaned up and updated repo. Please open a new issue if you have further questions.

cs329yangzhong commented 1 year ago

Thanks a lot!

Shen-Chenhui / SentBS

Problem on reproducing the sent-ctrl baseline #1