axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.4k stars 797 forks source link

Preprocess with debug gives error. #1599

Open amitagh opened 3 months ago

amitagh commented 3 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

Preprocess with debug should work but gives error:

without --debug it works.

Using below dataset config.

datasets:

Current behaviour

Preprocess with debug should work but gives error:

** Axolotl Dependency Versions *** accelerate: 0.28.0
peft: 0.10.0
transformers: 4.40.0.dev0
trl: 0.8.5
torch: 2.1.2
bitsandbytes: 0.43.0


Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 70, in fire.Fire(do_cli) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 30, in do_cli parsed_cfg = load_cfg(config, kwargs) File "/content/src/axolotl/src/axolotl/cli/init.py", line 352, in load_cfg with open(config, encoding="utf-8") as file: FileNotFoundError: [Errno 2] No such file or directory: 'examples'

Steps to reproduce

Run preprocess with debug option and error is seen.

Config yaml

base_model: meta-llama/Meta-Llama-3-8B-Instruct
#model_type: AutoModelForCausalLM  #For Gemma
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

#datasets:
#  - path: /content/test_txt_data-10exmpl.json
#    type: completion
#    field: text
#datasets:
#  - path: ./mar_alpaca_dataset.json
#    type: alpaca
#    ds_type: json
datasets:
  - path: /content/mar_orca_dataset.json
    type: alpaca_w_system.load_open_orca
    ds_type: json
dataset_prepared_path: /content
dataset_processes: 2
val_set_size: 0
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 700
sample_packing: true
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj
#lora_modules_to_save:
  #- embed_tokens
  #- lm_head
lora_target_linear: true
lora_fan_in_fan_out:

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: false
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: False

warmup_ratio: 0.1
evals_per_epoch: 1
eval_table_size:
eval_max_new_tokens: 128
eval_sample_packing: False
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

save_safetensors: True
gpu_memory_limit: 14

Possible solution

There shouldnt be an error

Which Operating Systems are you using?

Python Version

3.10

axolotl branch-commit

latest

Acknowledgements

winglian commented 3 months ago

Hi @amitagh, what is the exact command you used with --debug? make sure to use --debug after you set the YAML file argument.

correct: python -m axolotl.cli.preprocess path/to/your.yaml --debug

incorrect: python -m axolotl.cli.preprocess --debug path/to/your.yaml

amitagh commented 3 months ago

You are correct. After placing --debug after yml file it works.

<|begin_of_text|>(-100, 128000) ###(-100, 14711) System(-100, 744) : (-100, 512) You(-100, 2675) are(-100, 527) an(-100, 459) AI(-100, 15592) assistant(-100, 18328) .(-100, 13) You(-100, 1472) will(-100, 690) be(-100, 387) given(-100, 2728) a(-100, 264) task(-100, 3465) .(-100, 13) You(-100, 1472) must(-100, 2011) generate(-100, 7068) a(-100, 264) detailed(-100, 11944) and(-100, 323) long(-100, 1317) answer(-100, 4320) . (-100, 627) ###(-100, 14711) Human(-100, 11344) : (-100, 512) Generate(-100, 32215) an(-100, 459) approximately(-100, 13489) fifteen(-100, 37755) -word(-100, 38428) sentence(-100, 11914) that(-100, 430) describes(-100, 16964) all(-100, 682) this(-100, 420) data(-100, 828) :(-100, 25) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) eat(-100, 8343) Type(-100, 941) restaurant(-100, 10960) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) food(-100, 3691) Chinese(-100, 8620) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) price(-100, 3430) Range(-100, 6174) moderate(-100, 24070) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) customer(-100, 6130) rating(-100, 10959) (-100, 220) 3(-100, 18) out(-100, 704) of(-100, 315) (-100, 220) 5(-100, 20) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) near(-100, 3221) All(-100, 2052) Bar(-100, 4821) One(-100, 3861) (-100, 198) ###(-100, 14711) Assistant(-100, 22103) : (-100, 512) Mid(34748, 34748) summer(63666, 63666) House(4783, 4783) is(374, 374) a(264, 264) moderately(70351, 70351) priced(33705, 33705) Chinese(8620, 8620) restaurant(10960, 10960) with(449, 449) a(264, 264) (220, 220) 3(18, 18) /(14, 14) 5(20, 20) customer(6130, 6130) rating(10959, 10959) ,(11, 11) located(7559, 7559) near(3221, 3221) All(2052, 2052) Bar(4821, 4821) One(3861, 3861) .(13, 13) <|end_of_text|>(128001, 128001)

Above is the output it generated which seems correct as per orca format.

Thanks, Amit.

On Thu, May 9, 2024 at 1:13 AM Wing Lian @.***> wrote:

Hi @amitagh https://github.com/amitagh, what is the exact command you used with --debug? make sure to use --debug after you set the YAML file argument.

correct: python -m axolotl.cli.preprocess path/to/your.yaml --debug

incorrect: python -m axolotl.cli.preprocess --debug path/to/your.yaml

— Reply to this email directly, view it on GitHub https://github.com/OpenAccess-AI-Collective/axolotl/issues/1599#issuecomment-2101303619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASHD4BHRUSIEENG5CARRPCLZBJ54XAVCNFSM6AAAAABHLGE2P2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGMYDGNRRHE . You are receiving this because you were mentioned.Message ID: @.***>