facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

AssertionError: Sentences lengths should not exceed max_tokens=400000 #5447

Closed BirdiD closed 7 months ago

BirdiD commented 7 months ago

❓ Questions and Help

Hello everyone, I am trying to perform audio pretraining task with data2vec. I keep getting the following error “ AssertionError: Sentences lengths should not exceed max_tokens=400000”. I have tried modifying the trainer.py as suggested here : https://github.com/facebookresearch/fairseq/issues/4759 but this does not resolve the issue.

Has someone already come accross this error ? Also, I don’t understand why there is an error on sentences length since I am performing an audio pretraining task and not an ASR ?

Thanks in advance for your help !

BirdiD commented 7 months ago

Just removing max--tokens argument for audio pretraining solve the issue