HKUST-KnowComp / Knowledge-Constrained-Decoding

Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection"
27 stars 0 forks source link

Questions Regarding Data Processing #1

Closed jinjukiko closed 9 months ago

jinjukiko commented 9 months ago

Hi there,

Firstly, I want to express my appreciation for the fantastic work you've done!

I have a couple of questions regarding the data processing part:

In the code, I noticed that the original data labels are not transformed from a single integer to an integer sequence. Could you confirm whether this behavior aligns with the expected processing?

Regarding the training datasets, wow_train_augmented and wow_dev_unseen_augmented: Are these datasets generated by mixing the original dataset with two neg_datasets, followed by a proportional split?

I appreciate your time and assistance in clarifying these points.