Closed QinengWang-Aiden closed 1 year ago
Is it possible that the dataset you're loading has only 30155 batches?
No, actually the number of batches is more than 40,000...
after i changed the input data format, this problems solved ...
That's great!
What do you mean by changing the input data format?
I changed my original data format from one without spaces to one with spaces, and then retrained a tokenizer, and it seems that I haven't encountered this issue anymore...
Newly added
I found another promising solution to this just now: Cannot call sizes() on tensor with symbolic sizes/strides
I will now try this approach to see if it works.
It works for me! So anyone who is using pytorch-nightly
and has the similar issuses can refer to this discussion as a feasible solution!
I have moved my new question to another issue :)
I have encountered a problem
Cannot call sizes() on tensor with symbolic sizes/strides
during pre-training. Whenever I try to pretrain the nanoT5 usinggoogle/t5-v1_1-small
config, the program fails at step 30155 out of the total 32768 steps. I only modified the modulesget_tokenizer
andload_dataset_splits
to load a customized dataset and tokenizer. The rest of the program remains unchanged except for an added wandb logger. But when I start to train from checkpoint-30000, this problem does not occur (neither happens when I set theargs.current_train_step=1
nor when I set theargs.current_train_step=30000
). Below is the detailed traceback stack:And this is my command that runs the code: