Open hustcxx opened 2 years ago
We used the \<s> token (for bart) to init the [doc] token in our code.
Thanks for your response. Another question. I want to know how did you process the PudMed dataset, like Match???
yes! we tokenize the dataset and keep the max_input_length
to 512.
The paper describes using the [doc] token to generate a representation of the document, but I don't see where to add the [doc] token in the data processing file ext_lable_and_tokenize.py ?