Closed artemdinaburg closed 6 months ago
As another reference, this problem looks to be an exact replica of https://github.com/UKPLab/sentence-transformers/issues/2588 which may explain things better.
CodeQwen now use the different tokenizer compared with Qwen1.5. Using Qwen2Tokenizer
may not be correct.
Could you provide more reproducible descriptions, including environment versions, demo scripts, etc?
(FYI, we update the config information 17 days ago, to fix the updating of PreTrainedTokenizerFast of transformers=4.40.0)
Hi! Thank you so much for getting back to me! I can replicate the issue in Axolotl, and to sanity check, I wanted to see if I could also replicate it in LLamaFactory. I could not, but interestingly saw this:
[INFO|trainer.py:804] 2024-05-30 20:39:25,035 >> The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: token_type_ids. If token_type_ids are not expected by `PeftModelForCausalLM.forward`, you can safely ignore this message.
It looks like the base transformers.Trainer
implementation will automatically remove unused columns from the Dataset in a call to get_train_dataloader
, and avoids this error when run via LLamaFactory. Axolotl defines a custom get_train_dataloader
function that does not ignore unused columns and so the following failure happens:
python3 -m axolotl.cli.train codeqwen.yml
[2024-05-30 22:40:32,983] [INFO] [datasets.<module>:58] [PID:41130] PyTorch version 2.3.0 available.
[2024-05-30 22:40:34,099] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-05-30 22:40:34,171] [INFO] [root.spawn:38] [PID:41130] gcc -pthread -B /opt/conda/envs/axo/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/axo/include -fPIC -O2 -isystem /opt/conda/envs/axo/include -fPIC -c /var/tmp/tmpmhbgpa43/test.c -o /var/tmp/tmpmhbgpa43/test.o
[2024-05-30 22:40:34,192] [INFO] [root.spawn:38] [PID:41130] gcc -pthread -B /opt/conda/envs/axo/compiler_compat /var/tmp/tmpmhbgpa43/test.o -laio -o /var/tmp/tmpmhbgpa43/a.out
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-05-30 22:40:35,431] [WARNING] [axolotl.utils.config.models.input.hint_sample_packing_padding:747] [PID:41130] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2024-05-30 22:40:35,431] [WARNING] [axolotl.utils.config.models.input.check_sample_packing_wo_flash:730] [PID:41130] [RANK:0] sample_packing without flash_attention or sdp_attention does not handle cross-attention.
[2024-05-30 22:40:35,431] [WARNING] [axolotl.utils.config.models.input.hint_trust_remote_code:308] [PID:41130] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.
[2024-05-30 22:40:35,432] [DEBUG] [axolotl.normalize_config:79] [PID:41130] [RANK:0] bf16 support detected, enabling for this configuration.
/opt/conda/envs/axo/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[2024-05-30 22:40:35,562] [INFO] [axolotl.normalize_config:182] [PID:41130] [RANK:0] GPU memory usage baseline: 0.000GB (+0.537GB misc)
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
****************************************
**** Axolotl Dependency Versions *****
accelerate: 0.30.1
peft: 0.11.1
transformers: 4.41.1
trl: 0.8.6
torch: 2.3.0
bitsandbytes: 0.43.1
****************************************
/opt/conda/envs/axo/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[2024-05-30 22:40:36,893] [DEBUG] [axolotl.load_tokenizer:280] [PID:41130] [RANK:0] EOS: 2 / <|endoftext|>
[2024-05-30 22:40:36,893] [DEBUG] [axolotl.load_tokenizer:281] [PID:41130] [RANK:0] BOS: 2 / <|endoftext|>
[2024-05-30 22:40:36,893] [DEBUG] [axolotl.load_tokenizer:282] [PID:41130] [RANK:0] PAD: 92298 / <fim_pad>
[2024-05-30 22:40:36,893] [DEBUG] [axolotl.load_tokenizer:283] [PID:41130] [RANK:0] UNK: 0 / <unk>
[2024-05-30 22:40:36,893] [INFO] [axolotl.load_tokenizer:294] [PID:41130] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-05-30 22:40:36,893] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:41130] [RANK:0] Unable to find prepared dataset in cq/last/fbee903f846296c4151e7746385e182d
[2024-05-30 22:40:36,893] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:41130] [RANK:0] Loading raw datasets...
[2024-05-30 22:40:36,894] [WARNING] [axolotl.load_tokenized_prepared_datasets:186] [PID:41130] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset.
[2024-05-30 22:40:36,894] [INFO] [axolotl.load_tokenized_prepared_datasets:193] [PID:41130] [RANK:0] No seed provided, using default seed of 42
Generating train split: 3 examples [00:00, 659.79 examples/s]
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
[2024-05-30 22:40:37,140] [WARNING] [datasets.arrow_dataset.map:3087] [PID:41130] num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
Tokenizing Prompts (num_proc=3): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 6.21 examples/s]
[2024-05-30 22:40:37,739] [INFO] [axolotl.load_tokenized_prepared_datasets:414] [PID:41130] [RANK:0] merging datasets
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
[2024-05-30 22:40:37,743] [WARNING] [datasets.arrow_dataset.map:3087] [PID:41130] num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
Dropping Long Sequences (num_proc=3): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 25.83 examples/s]
num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
[2024-05-30 22:40:37,909] [WARNING] [datasets.arrow_dataset.map:3087] [PID:41130] num_proc must be <= 3. Reducing num_proc to 3 for dataset of size 3.
Add position_id column (Sample Packing) (num_proc=3): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 25.03 examples/s]
[2024-05-30 22:40:38,080] [INFO] [axolotl.load_tokenized_prepared_datasets:427] [PID:41130] [RANK:0] Saving merged prepared dataset to disk... cq/last/fbee903f846296c4151e7746385e182d
Saving the dataset (1/1 shards): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 515.38 examples/s]
[2024-05-30 22:40:38,095] [DEBUG] [axolotl.calculate_total_num_steps:299] [PID:41130] [RANK:0] total_num_tokens: 394
[2024-05-30 22:40:38,096] [DEBUG] [axolotl.calculate_total_num_steps:312] [PID:41130] [RANK:0] `total_supervised_tokens: 394`
[2024-05-30 22:40:43,669] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:41130] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 394
[2024-05-30 22:40:43,669] [DEBUG] [axolotl.calculate_total_num_steps:364] [PID:41130] [RANK:0] data_loader_len: 0
[2024-05-30 22:40:43,669] [INFO] [axolotl.calc_sample_packing_eff_est:370] [PID:41130] [RANK:0] sample_packing_eff_est across ranks: [0.2565104166666667]
[2024-05-30 22:40:43,669] [DEBUG] [axolotl.calculate_total_num_steps:382] [PID:41130] [RANK:0] sample_packing_eff_est: 0.26
[2024-05-30 22:40:43,669] [DEBUG] [axolotl.calculate_total_num_steps:390] [PID:41130] [RANK:0] total_num_steps: 0
[2024-05-30 22:40:43,684] [DEBUG] [axolotl.train.train:56] [PID:41130] [RANK:0] loading tokenizer... Qwen/CodeQwen1.5-7B
/opt/conda/envs/axo/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[2024-05-30 22:40:44,742] [DEBUG] [axolotl.load_tokenizer:280] [PID:41130] [RANK:0] EOS: 2 / <|endoftext|>
[2024-05-30 22:40:44,742] [DEBUG] [axolotl.load_tokenizer:281] [PID:41130] [RANK:0] BOS: 2 / <|endoftext|>
[2024-05-30 22:40:44,742] [DEBUG] [axolotl.load_tokenizer:282] [PID:41130] [RANK:0] PAD: 92298 / <fim_pad>
[2024-05-30 22:40:44,742] [DEBUG] [axolotl.load_tokenizer:283] [PID:41130] [RANK:0] UNK: 0 / <unk>
[2024-05-30 22:40:44,742] [INFO] [axolotl.load_tokenizer:294] [PID:41130] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-05-30 22:40:44,742] [DEBUG] [axolotl.train.train:85] [PID:41130] [RANK:0] loading model and peft_config...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.45s/it]
/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:520: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:520: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
[2024-05-30 22:40:52,523] [INFO] [axolotl.load_model:734] [PID:41130] [RANK:0] GPU memory usage after model load: 3.350GB (+0.066GB cache, +0.751GB misc)
[2024-05-30 22:40:52,546] [INFO] [axolotl.load_model:785] [PID:41130] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-05-30 22:40:52,552] [INFO] [axolotl.load_model:794] [PID:41130] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-05-30 22:40:52,556] [INFO] [axolotl.load_lora:951] [PID:41130] [RANK:0] found linear modules: ['q_proj', 'o_proj', 'k_proj', 'down_proj', 'v_proj', 'up_proj', 'gate_proj']
trainable params: 320,339,968 || all params: 7,570,624,512 || trainable%: 4.2314
[2024-05-30 22:40:56,245] [INFO] [axolotl.load_model:843] [PID:41130] [RANK:0] GPU memory usage after adapters: 3.797GB (+1.035GB cache, +0.751GB misc)
/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/training_args.py:1474: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
warnings.warn(
[2024-05-30 22:40:56,277] [INFO] [axolotl.train.train:119] [PID:41130] [RANK:0] Pre-saving adapter config to ./cq
[2024-05-30 22:40:56,353] [INFO] [axolotl.train.train:156] [PID:41130] [RANK:0] Starting trainer...
[2024-05-30 22:40:56,600] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:41130] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 394
[2024-05-30 22:40:56,601] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:41130] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 394
0%| | 0/1 [00:00<?, ?it/s][2024-05-30 22:40:56,658] [INFO] [axolotl.utils.samplers.multipack._len_est:185] [PID:41130] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 394
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Traceback (most recent call last):
File "/opt/conda/envs/axo/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/axo/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/artem/axolotl/src/axolotl/cli/train.py", line 70, in <module>
fire.Fire(do_cli)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/axo/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/artem/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/home/artem/axolotl/src/axolotl/cli/train.py", line 66, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/home/artem/axolotl/src/axolotl/train.py", line 170, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
loss = self.compute_loss(model, inputs)
File "/home/artem/axolotl/src/axolotl/core/trainer_builder.py", line 526, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
outputs = model(**inputs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/opt/conda/envs/axo/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward
return self.base_model(
File "/opt/conda/envs/axo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
return self.model.forward(*args, **kwargs)
File "/opt/conda/envs/axo/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
TypeError: Qwen2ForCausalLM.forward() got an unexpected keyword argument 'token_type_ids'
0%| | 0/1 [00:00<?, ?it/s]
I'm attaching an Axolotl config and data file which triggers the issue.
Anyway I am not quite sure what should be patched - in theory, the tokenizer should agree with the model for which data columns to expect, but maybe the trainer should also handle the case if its not 🤷.
If there's some way to fix the model so that the data generated by the tokenizer matches whats expected by the model I'd be very thankful. axo_codeqwen.tar.gz
thank you for your notification of this bug.
I found you have already pull request to fix this bug on axolotl.
https://github.com/OpenAccess-AI-Collective/axolotl/pull/1656
Hi!
I tried to reach someone via the Discord but haven't been able to, so I figured I'd post here.
The CodeQwen 1.5 7B model on HF defines the model as
Qwen2ForCausalLM
(https://huggingface.co/Qwen/CodeQwen1.5-7B/blob/main/config.json) but the tokenizer asPreTrainedTokenizerFast
(https://huggingface.co/Qwen/CodeQwen1.5-7B/blob/main/tokenizer_config.json#L14).This is probably wrong, since
PretrainedTokenizerFast
is the tokenizer base class, and more importantly it definestoken_type_ids
(https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1562) but the modelQwen2ForCausalLM
doesn't handle that return vector from tokenization.This causes both tokenization and finetuning failures when using Axolotl to finetune CodeQwen 1.5 7B in completion mode.
The correct choice is almost certainly to set the tokenizer to
Qwen2Tokenizer
, like in Qwen1.5 regular, non-code models (see: https://huggingface.co/Qwen/Qwen1.5-4B/blob/main/tokenizer_config.json#L38)I tried manually editing
tokenizer_config.json
on a local QwenCode 1.5 download and confirm it fixes my finetuning issues.