HKUST-KnowComp / Knowledge-Constrained-Decoding

Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection"
25 stars 0 forks source link

More instruction for train_t5_token_classifier? #4

Open qishenghu opened 2 months ago

qishenghu commented 2 months ago

Hi authors,

I rlly appreciate your great work. I am trying to run the train_t5_token_classifier.sh for FUDGE.

Could you kindly add more instructions on how to generate the $DATADIR/wow_dev_unseen_augmented?

train_data_path=$DATADIR/wow_train_augmented
validation_data_path=$DATADIR/wow_dev_unseen_augmented

I have run the below cmds and got a wow_train_augmented_neg_google-flan-t5-xl and wow_train_augmented_neg_random.

bash scripts/shell/data_process/partial_neg_gen.sh 0 wow 16
bash scripts/shell/data_process/random_neg.sh wow

Thanks!

syncdoth commented 2 months ago

In those 2 files, you can edit the data_options variable from train.jsonl to dev_unseen.jsonl to achieve augmented dev dataset. That is, after you run the preprocess.sh to obtain dev_unseen.jsonl file.