Open its5Q opened 6 months ago
Does this mean, if you don't use --debug
, it would not error?
Also, I believe the use of pertaining_dataset
was to perform streaming (which is the opposite of trying to preprocess in advance).
Does this mean, if you don't use
--debug
, it would not error?Also, I believe the use of
pertaining_dataset
was to perform streaming (which is the opposite of trying to preprocess in advance).
Yeah, it wouldn't, at least in that specific function, because the debugging code tries to select samples from an iterable dataset, which is unsupported. I couldn't personally get it to work either way. Also I stumbled across some other training issues I couldn't resolve due to lack of or outdated documentation/examples, so I switched to using HF Trainer instead.
Please check that this issue hasn't been reported before.
Expected Behavior
I expect to be able to debug pretraining dataset preprocessing.
Current behaviour
Axolotl tries to use .select() on an IterableDataset object
Steps to reproduce
Run
python -m axolotl.cli.preprocess --debug
on any config that has a pretraining_datasetConfig yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main/132eb74
Acknowledgements