Cannot train the longformer-base-4096 in CNN/DM dataset

ShibataGenjiro commented 3 years ago

Hi,

I created the environment by

conda env create --file environment.yml

My environment.yml:

name: transformersum channels:

conda-forge

pytorch dependencies:

python==3.8.6

pytorch

scikit-learn

tensorboard

spacy

sphinx

pip

pip:

pytorch_lightning

transformers==3.0.2

torch_optimizer

wandb

rouge-score

packaging

datasets

gradio

Then I downloaded the CNN/DM dataset for the longformer-base-4096 from https://drive.google.com/uc?id=1438kLkTC9zc9otkA7Q7sJqDdCxBrfWqj

Next, I run the convert_extractive_pt_to_txt.py in the scripts folder and get the CNN/DM dataset (.txt).

Finally, I trained the longformer model in my 3090 GPU by

python main.py --data_path ../datasets/cnn_dailymail_processor --model_name_or_path allenai/longformer-base-4096 --model_type longformer --weights_save_path ../trained_models --do_train --max_steps 5000

and get an error:

Validation sanity check: 0it [00:00, ?it/s]2020-12-14 19:32:07,863|transformers.modeling_longformer|INFO> Input ids are automatically padded from 1315 to 1536 to be a multiple of config.attention_window: 512 /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [19,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [162,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [12,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [13,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [14,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [15,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [16,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [17,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [18,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [19,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [20,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [21,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [22,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [23,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [24,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [25,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [26,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [27,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [28,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1607370172916/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [174,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "main.py", line 408, in main(main_args) File "main.py", line 95, in main trainer.fit(model) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit results = self.accelerator_backend.train() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 66, in train results = self.train_or_test() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 65, in train_or_test results = self.trainer.train() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 492, in train self.run_sanity_check(self.get_model()) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 690, in run_sanitycheck , eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 606, in run_evaluation output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step output = self.trainer.accelerator_backend.validation_step(args) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 84, in validation_step return self._step(self.trainer.model.validation_step, args) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 76, in _step output = model_step(*args) File "/home/admin02/code/sss/longformer/transformersum/src/extractive.py", line 762, in validation_step outputs, mask = self.forward(batch) File "/home/admin02/code/sss/longformer/transformersum/src/extractive.py", line 284, in forward outputs = self.word_embedding_model(inputs, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 1000, in forward encoder_outputs = self.encoder( File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 695, in forward layer_outputs = layer_module(hidden_states, attention_mask, output_attentions,) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 658, in forward self_attn_outputs = self.attention(hidden_states, attention_mask, output_attentions=output_attentions,) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 642, in forward self_outputs = self.self(hidden_states, attention_mask, output_attentions,) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 299, in forward is_global_attn = any(is_index_global_attn.flatten()) RuntimeError: CUDA error: device-side assert triggered

Then I used CPU and get another error:

Validation sanity check: 0it [00:00, ?it/s]2020-12-14 19:32:52,152|transformers.modeling_longformer|INFO> Input ids are automatically padded from 1315 to 1536 to be a multiple of config.attention_window: 512 Traceback (most recent call last): File "main.py", line 408, in main(main_args) File "main.py", line 95, in main trainer.fit(model) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit results = self.accelerator_backend.train() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 61, in train results = self.train_or_test() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 65, in train_or_test results = self.trainer.train() File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 492, in train self.run_sanity_check(self.get_model()) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 690, in run_sanitycheck , eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 606, in run_evaluation output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step output = self.trainer.accelerator_backend.validation_step(args) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py", line 77, in validation_step output = self.trainer.model.validation_step(*args) File "/home/admin02/code/sss/longformer/transformersum/src/extractive.py", line 762, in validation_step outputs, mask = self.forward(batch) File "/home/admin02/code/sss/longformer/transformersum/src/extractive.py", line 284, in forward outputs = self.word_embedding_model(inputs, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_longformer.py", line 996, in forward embedding_output = self.embeddings( File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_roberta.py", line 67, in forward return super().forward( File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/transformers/modeling_bert.py", line 180, in forward token_type_embeddings = self.token_type_embeddings(token_type_ids) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 124, in forward return F.embedding( File "/home/admin02/.conda/envs/transformersum/lib/python3.8/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

My computing environment: GPU:3090 nvcc -V:11.1 torch:1.7.1 Python:3.8.6 cudatoolkit:11.0.3

Did I get the running environment wrong? Or something else is wrong? I'm not sure.

Thank you in advance.

HHousen commented 3 years ago

@ShibataGenjiro Thank you for the detailed bug report. The longformer does not support token_type_ids so you need to set the --no_use_token_type_ids option. I've pushed a change that will automatically enable this option for the longformer. I also opened https://github.com/huggingface/transformers/issues/9111 to make the huggingface/transformers documentation more clear that the longformer does not support token_type_ids. Let me know how your training run goes. If you end up training the longformer on CNN/DM, I'd appreciate it if you open a pull request with a link to the model weights file so it can be added to the library.

ShibataGenjiro commented 3 years ago

@HHousen Thank you very much. The longformer can be trained now. (But training is very slow, because I set the batch_size to 1 on single 3090 GPU. If I set a larger batch_size, OOM problem will occur.)

Anyway, you said that the longformer does not use the token_type_ids (segment_id in BERTSUM, I think).

Does this mean that longform only uses token embeddings and position embeddings as input? (while BERTSUM uses token embeddings, segment embeddings and position embeddings)

HHousen commented 3 years ago

@ShibataGenjiro Correct, the longformer only uses token embeddings and position embeddings while BERT uses token embeddings, segment embeddings, and position embeddings. This is because the longformer is based on RoBERTa, which is an improved version of BERT. Regarding the OOM issue, you can try setting --gradient_checkpointing to enable less memory consumption at the expense of a slower backward pass.

ShibataGenjiro commented 3 years ago

@HHousen OK, I will try. Thank you for your patience in explaining!^^

HHousen commented 3 years ago

@HHousen OK, I will try. Thank you for your patience in explaining!^^

No problem :smile:.

thechargedneutron commented 2 years ago

Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option --no_use_token_type_ids does not seem to be a valid option for abstractive.

HHousen commented 2 years ago

Hi @HHousen I am having the exact set of errors when doing abstractive summarization. Is abstractive summarization with CNN/DM dataset not supported with longformer? I checked the changes that you made in #0729e1f08135a81f2a12062a248eb9ab557a0f6f but that does not seem to translate to abstractive summarization. Also, the option --no_use_token_type_ids does not seem to be a valid option for abstractive.

@thechargedneutron A seq2seq (text-to-text) model is needed for abstractive summarization (like t5, BART, etc). The longformer is just an encoder. It does not have a decoder. However, the LED exists for this exact purpose. Here is the huggingface/transformers documentation.

HHousen / TransformerSum

Cannot train the longformer-base-4096 in CNN/DM dataset #40