Open SicariusSicariiStuff opened 2 months ago
To work the same as when loading the dataset from HF
Asks for a custom .py script
Load a local json file:
pretraining_dataset: /home/sicarius/somefile.jsonl type: pretrain
Treat it similarly as a loading a dataset from the HF hub
3.10
latest release
Hey, sorry it's been a while. We are currently internally discussing providing better support for this and pre-training/sft in general. We plan to extend support to local and cloud storage (S3 etc).
Please check that this issue hasn't been reported before.
Expected Behavior
To work the same as when loading the dataset from HF
Current behaviour
Asks for a custom .py script
Steps to reproduce
Load a local json file:
pretraining_dataset: /home/sicarius/somefile.jsonl type: pretrain
Config yaml
Possible solution
Treat it similarly as a loading a dataset from the HF hub
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
latest release
Acknowledgements