csebuetnlp / xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
https://aclanthology.org/2021.findings-acl.413/
257 stars 41 forks source link

Using cls_token, but it is not set yet #13

Closed Yasotha196y closed 1 year ago

Yasotha196y commented 1 year ago

I am using colab to run this project along with T4 GPU in colab. But i'm facing the following issue. Could you please resolve it me

/usr/local/lib/python3.10/site-packages/transformers/trainer.py:1498: FutureWarning: model_path is deprecated and will be removed in a future version. Use resume_from_checkpoint instead. warnings.warn( [INFO|trainer.py:1714] 2023-08-31 08:05:58,660 >> Running training [INFO|trainer.py:1715] 2023-08-31 08:05:58,660 >> Num examples = 16,222 [INFO|trainer.py:1716] 2023-08-31 08:05:58,660 >> Num Epochs = 10 [INFO|trainer.py:1717] 2023-08-31 08:05:58,660 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1720] 2023-08-31 08:05:58,660 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1721] 2023-08-31 08:05:58,660 >> Gradient Accumulation steps = 8 [INFO|trainer.py:1722] 2023-08-31 08:05:58,660 >> Total optimization steps = 20,270 [INFO|trainer.py:1723] 2023-08-31 08:05:58,664 >> Number of trainable parameters = 582,401,280 0% 0/20270 [00:00<?, ?it/s][INFO|trainer_utils.py:696] 2023-08-31 08:05:59,992 >> The following columns in the training set don't have a corresponding argument in MT5ForConditionalGeneration.forward and have been ignored: src_texts, tgt_texts, id. If src_texts, tgt_texts, id are not expected by MT5ForConditionalGeneration.forward, you can safely ignore this message. Traceback (most recent call last): File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 536, in main() File "/content/drive/MyDrive/xl-sum/seq2seq/pipeline.py", line 462, in main train_result = trainer.train( File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1815, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/usr/local/lib/python3.10/site-packages/accelerate/data_loader.py", line 384, in iter current_batch = next(dataloader_iter) File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/usr/local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/usr/local/lib/python3.10/site-packages/transformers/trainer_utils.py", line 707, in call return self.data_collator(features) File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 456, in call batch = self._encode(batch) File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in _encode [x["src_texts"] for x in batch], File "/content/drive/MyDrive/xl-sum/seq2seq/utils.py", line 499, in [x["src_texts"] for x in batch], KeyError: 'src_texts' 0% 0/20270 [00:01<?, ?it/s]

abhik1505040 commented 1 year ago

It seems like you are using a different version of transformers than the one this repo requires. Please use this script to install the necessary dependencies correctly before running the training scripts.