Closed enpassanty closed 3 years ago
Could you try the branch here? https://github.com/PyTorchLightning/lightning-transformers/pull/160
You can set the block size using this branch like: dataset.cfg.block_size=512
from the command line!
this solved the problem for me. thanks!
trying to train a mlm on custom data. the sequences in the csv are long - when training on huggingface run_mlm.py, I truncate at 512 tokens. How do I access
max_length
arg? why am I hitting a block_size key error? is this required for custom data?traceback: