carriex / recomp

RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.
MIT License
83 stars 4 forks source link

Empty line in the generated output #6

Closed Hannibal046 closed 6 months ago

Hannibal046 commented 7 months ago

Hi, thanks so much for the great work and code! After running the following:

python train_hf_summarization_model.py \
--model_name_or_path fangyuan/nq_abstractive_compressor \
--do_predict \
--test_file  data/abstractive_compressor_inputs/nq_dev_contriever_msmarco_top_5_docs.json \
--max_target_length 512 \
--output_dir outputs/ \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=16 \
--predict_with_generate

I got generated results where there are many empty lines: image

And the total number of generated results equal to the original input data (3610 for NQ).

I want to ask if this is the final output as part of the prompt or should I do some post-processing?

carriex commented 7 months ago

Hi! This is expected as we construct the data so that abstractive compressor can output an empty string to perform selectively augmentation -- please see section 3.2 of our paper for detailed description.

Hannibal046 commented 6 months ago

Appreciate!