AssertionError: Sentences lengths should not exceed max_tokens=400000

❓ Questions and Help

Hello everyone, I am trying to perform audio pretraining task with data2vec. I keep getting the following error “ AssertionError: Sentences lengths should not exceed max_tokens=400000”. I have tried modifying the trainer.py as suggested here : https://github.com/facebookresearch/fairseq/issues/4759 but this does not resolve the issue.

Has someone already come accross this error ? Also, I don’t understand why there is an error on sentences length since I am performing an audio pretraining task and not an ASR ?

Thanks in advance for your help !

facebookresearch / fairseq

AssertionError: Sentences lengths should not exceed max_tokens=400000 #5447

❓ Questions and Help