Closed LeoLai930603 closed 4 years ago
Hello! Sorry for delayed reply. As you can notice in our paper, GPT-2 experiments were only done using the greedy decoding, i.e. we did not use nucleus sampling with GPT-2.
The GPT-2 codebase is entirely based on HuggingFace transformers code (except for unlikelihood part and corresponding statistics we added). If you want to use batched evaluation with nucleus sampling, feel free to change the code to make work.
Closing, but please repoen if there are further questions.
Hi guys,
I try to track the data transformation in your methods.
During training, the original input has a shape => [new_batch_size, prefix_length] after this line: https://github.com/facebookresearch/unlikelihood_training/blob/723747171a3fa909cda68df399e39f0a3e5067d9/custom/gpt2/run_gpt2.py#L133 which is no long [1, original_sequence_length]
However, in the function
top_k_top_p_filtering
, there is an assertion: https://github.com/facebookresearch/unlikelihood_training/blob/723747171a3fa909cda68df399e39f0a3e5067d9/custom/gpt2/run_gpt2.py#L47 and the code could only be excuted with this requirement.Now I am confused with this situation, why the assertion of batch_size == 1 is required? Is this a flaw in the code?