This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
I am using colab to run this project along with T4 GPU in colab. But i'm facing the following issue. Could you please resolve it me
/usr/local/lib/python3.10/site-packages/transformers/trainer.py:1498: FutureWarning: model_path is deprecated and will be removed in a future version. Use resume_from_checkpoint instead.
warnings.warn(
[INFO|trainer.py:1714] 2023-08-31 08:05:58,660 >> Running training
[INFO|trainer.py:1715] 2023-08-31 08:05:58,660 >> Num examples = 16,222
[INFO|trainer.py:1716] 2023-08-31 08:05:58,660 >> Num Epochs = 10
[INFO|trainer.py:1717] 2023-08-31 08:05:58,660 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1720] 2023-08-31 08:05:58,660 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1721] 2023-08-31 08:05:58,660 >> Gradient Accumulation steps = 8
[INFO|trainer.py:1722] 2023-08-31 08:05:58,660 >> Total optimization steps = 20,270
[INFO|trainer.py:1723] 2023-08-31 08:05:58,664 >> Number of trainable parameters = 582,401,280
0% 0/20270 [00:00<?, ?it/s][INFO|trainer_utils.py:696] 2023-08-31 08:05:59,992 >> The following columns in the training set don't have a corresponding argument in MT5ForConditionalGeneration.forward and have been ignored: src_texts, tgt_texts, id. If src_texts, tgt_texts, id are not expected by MT5ForConditionalGeneration.forward, you can safely ignore this message.
Traceback (most recent call last):
File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 536, in
main()
File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 462, in main
train_result = trainer.train(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1815, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/usr/local/lib/python3.10/site-packages/accelerate/data_loader.py", line 384, in iter
current_batch = next(dataloader_iter)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer_utils.py", line 707, in call
return self.data_collator(features)
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 456, in call
batch = self._encode(batch)
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in _encode
[x["src_texts"] for x in batch],
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in
[x["src_texts"] for x in batch],
KeyError: 'src_texts'
0% 0/20270 [00:01<?, ?it/s]
It seems like you are using a different version of transformers than the one this repo requires. Please use this script to install the necessary dependencies correctly before running the training scripts.
I am using colab to run this project along with T4 GPU in colab. But i'm facing the following issue. Could you please resolve it me
/usr/local/lib/python3.10/site-packages/transformers/trainer.py:1498: FutureWarning:
main()
File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 462, in main
train_result = trainer.train(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1815, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/usr/local/lib/python3.10/site-packages/accelerate/data_loader.py", line 384, in iter
current_batch = next(dataloader_iter)
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer_utils.py", line 707, in call
return self.data_collator(features)
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 456, in call
batch = self._encode(batch)
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in _encode
[x["src_texts"] for x in batch],
File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in
[x["src_texts"] for x in batch],
KeyError: 'src_texts'
0% 0/20270 [00:01<?, ?it/s]
model_path
is deprecated and will be removed in a future version. Useresume_from_checkpoint
instead. warnings.warn( [INFO|trainer.py:1714] 2023-08-31 08:05:58,660 >> Running training [INFO|trainer.py:1715] 2023-08-31 08:05:58,660 >> Num examples = 16,222 [INFO|trainer.py:1716] 2023-08-31 08:05:58,660 >> Num Epochs = 10 [INFO|trainer.py:1717] 2023-08-31 08:05:58,660 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1720] 2023-08-31 08:05:58,660 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1721] 2023-08-31 08:05:58,660 >> Gradient Accumulation steps = 8 [INFO|trainer.py:1722] 2023-08-31 08:05:58,660 >> Total optimization steps = 20,270 [INFO|trainer.py:1723] 2023-08-31 08:05:58,664 >> Number of trainable parameters = 582,401,280 0% 0/20270 [00:00<?, ?it/s][INFO|trainer_utils.py:696] 2023-08-31 08:05:59,992 >> The following columns in the training set don't have a corresponding argument inMT5ForConditionalGeneration.forward
and have been ignored: src_texts, tgt_texts, id. If src_texts, tgt_texts, id are not expected byMT5ForConditionalGeneration.forward
, you can safely ignore this message. Traceback (most recent call last): File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 536, in