Closed ohmeow closed 1 month ago
I don't see a dataset in your configuration YAML. Did you redact it? Can you provide some info on the dataset/prompt type you're trying to preprocess?
UPDATE: Looks like its something with Transformers with the recommendation being to install from github directly. See: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/54
Sorry, I'm adding that in dynamically (running in jupyter). I'm using the template free format which is working fine in old Llama3 ...
# axo_config_fpath = "configs/axolotl_configs/llama3-8b-qlora.yaml"
axo_config_fpath = "configs/axolotl_configs/llama3.1-8b-qlora.yaml"
train_data_fpath = "data/train_reviewed_template_free_1000.jsonl"
train_data_config = str(f'[{{"path": "{train_data_fpath}", "type":"input_output"}}]')
python -m axolotl.cli.preprocess {axo_config_fpath} --datasets '{train_data_config}'
Closing this out. Can verify that pip install from the transformers main branch provides the necessary fix.
Please check that this issue hasn't been reported before.
Expected Behavior
I expected to have a pre-processed dataset after running
python -m axolotl.cli.preprocess
Current behaviour
I get this error:
Full trace:
Steps to reproduce
Run >
python -m axolotl.cli.preprocess
Config yaml