Closed ShibataGenjiro closed 3 years ago
@ShibataGenjiro Thank you for the detailed bug report. The longformer does not support token_type_ids
so you need to set the --no_use_token_type_ids
option. I've pushed a change that will automatically enable this option for the longformer. I also opened https://github.com/huggingface/transformers/issues/9111 to make the huggingface/transformers documentation more clear that the longformer does not support token_type_ids
. Let me know how your training run goes. If you end up training the longformer on CNN/DM, I'd appreciate it if you open a pull request with a link to the model weights file so it can be added to the library.
@HHousen Thank you very much. The longformer can be trained now. (But training is very slow, because I set the batch_size to 1 on single 3090 GPU. If I set a larger batch_size, OOM problem will occur.)
Anyway, you said that the longformer does not use the token_type_ids
(segment_id in BERTSUM, I think).
Does this mean that longform only uses token embeddings
and position embeddings
as input? (while BERTSUM uses token embeddings
, segment embeddings
and position embeddings
)
@ShibataGenjiro Correct, the longformer only uses token embeddings
and position embeddings
while BERT uses token embeddings
, segment embeddings
, and position embeddings
. This is because the longformer is based on RoBERTa, which is an improved version of BERT. Regarding the OOM issue, you can try setting --gradient_checkpointing
to enable less memory consumption at the expense of a slower backward pass.
@HHousen OK, I will try. Thank you for your patience in explaining!^^
@HHousen OK, I will try. Thank you for your patience in explaining!^^
No problem :smile:.
Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option --no_use_token_type_ids
does not seem to be a valid option for abstractive.
Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option
--no_use_token_type_ids
does not seem to be a valid option for abstractive.
@thechargedneutron A seq2seq (text-to-text) model is needed for abstractive summarization (like t5, BART, etc). The longformer is just an encoder. It does not have a decoder. However, the LED exists for this exact purpose. Here is the huggingface/transformers documentation.
Hi,
I created the environment by
My environment.yml:
Then I downloaded the CNN/DM dataset for the longformer-base-4096 from https://drive.google.com/uc?id=1438kLkTC9zc9otkA7Q7sJqDdCxBrfWqj
Next, I run the
convert_extractive_pt_to_txt.py
in the scripts folder and get the CNN/DM dataset (.txt).Finally, I trained the longformer model in my 3090 GPU by
and get an error:
Then I used CPU and get another error:
My computing environment: GPU:3090 nvcc -V:11.1 torch:1.7.1 Python:3.8.6 cudatoolkit:11.0.3
Did I get the running environment wrong? Or something else is wrong? I'm not sure.
Thank you in advance.