Closed guanchangge closed 4 months ago
Hi @guanchangge,
Thanks for your interest in our work. This most likely arises due to mismatch in transformers and llm2vec library versions. Please check out the solution and discussion here
Let me know if you have any more questions.
Thanks for your answer, I use "pip install -e ." to install llm2vec, and it works with transformers=4.41.0.
Hi,
When I run this script python experiments/run_mntp.py trainconfigs/mntp/MetaLlama3.json, there is an error happened. | | | | ||| ||| ||| | | ||| |||| || ||| ||||
| | | | | | | || | | | | | | |
|||| | | | || | || | | | | | || ||| |||| | |||
| | | | | | | | | | || | | | | | | |
| | || ||| ||| ||| | | ||| | | | ||| |||_|
Enter your token (input will not be visible):
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Cannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.
git config --global credential.helper store
Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/mntp/Meta-Llama-3-8B-Instruct/runs/May29_19-17-13_sn4622119311,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=output/mntp/Meta-Llama-3-8B-Instruct,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=32,
per_device_train_batch_size=32,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=output/mntp/Meta-Llama-3-8B-Instruct,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=200,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
skip_memory_metrics=True,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
)
Overwrite dataset info from restored data version if exists. [143/321]
05/29/2024 19:17:15 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3
05/29/2024 19:17:15 - INFO - datasets.info - Loading Dataset info from /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d68
5c3
Found cached dataset wikitext (/home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3)
05/29/2024 19:17:15 - INFO - datasets.builder - Found cached dataset wikitext (/home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee7
1d232d685c3)
Loading Dataset info from /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3
05/29/2024 19:17:15 - INFO - datasets.info - Loading Dataset info from /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d68
5c3
/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:
Token has not been saved to git credential helper.
Your token has been saved to /home/changge/.cache/huggingface/token
Login successful
/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/training_args.py:1474: FutureWarning:
evaluation_strategy
is deprecated and will be removed in ve rsion 4.46 of π€ Transformers. Useeval_strategy
insteadwarnings.warn(
05/29/2024 19:17:13 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 4, distributed training: False, 16-bits training: False
05/29/2024 19:17:13 - INFO - main - Training/evaluation parameters TrainingArguments(
_n_gpu=4,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_steps=100,
eval_strategy=IntervalStrategy.STEPS,
evaluation_strategy=steps,
fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=
resume_download
is deprecated and will be removed in ver sion 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn( config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 654/654 [00:00<00:00, 4.98MB/s] [INFO|configuration_utils.py:733] 2024-05-29 19:17:15,651 >> loading configuration file config.json from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8 B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/config.json [INFO|configuration_utils.py:796] 2024-05-29 19:17:15,652 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.0", "use_cache": true, "vocab_size": 128256 } tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 51.0k/51.0k [00:00<00:00, 5.15MB/s] tokenizer.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9.09M/9.09M [00:00<00:00, 27.3MB/s] special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 73.0/73.0 [00:00<00:00, 624kB/s] [INFO|tokenization_utils_base.py:2108] 2024-05-29 19:17:16,306 >> loading file tokenizer.json from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Inst ruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/tokenizer.json [INFO|tokenization_utils_base.py:2108] 2024-05-29 19:17:16,306 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2108] 2024-05-29 19:17:16,306 >> loading file special_tokens_map.json from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Meta-Llama- 3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/special_tokens_map.json [INFO|tokenization_utils_base.py:2108] 2024-05-29 19:17:16,306 >> loading file tokenizer_config.json from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3- 8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/tokenizer_config.json [WARNING|logging.py:314] 2024-05-29 19:17:16,564 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. model.safetensors.index.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 23.9k/23.9k [00:00<00:00, 115MB/s] [INFO|modeling_utils.py:3474] 2024-05-29 19:17:16,710 >> loading weights file model.safetensors from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-In struct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/model.safetensors.index.json model-00001-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.98G/4.98G [00:46<00:00, 106MB/s] model-00002-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5.00G/5.00G [00:47<00:00, 105MB/s] model-00003-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.92G/4.92G [00:45<00:00, 108MB/s] model-00004-of-00004.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.17G/1.17G [00:10<00:00, 111MB/s] Downloading shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [02:31<00:00, 37.75s/it] [INFO|modeling_utils.py:1519] 2024-05-29 19:19:47,712 >> Instantiating LlamaBiForMNTP model under default dtype torch.bfloat16. [WARNING|logging.py:329] 2024-05-29 19:19:47,715 >> You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializ ing it on CPU withmodel.to('cuda')
. [INFO|configuration_utils.py:962] 2024-05-29 19:19:47,717 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 }Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [00:01<00:00, 3.49it/s] [INFO|modeling_utils.py:4280] 2024-05-29 19:19:48,957 >> All model checkpoint weights were used when initializing LlamaBiForMNTP.
[INFO|modeling_utils.py:4288] 2024-05-29 19:19:48,957 >> All the weights of LlamaBiForMNTP were initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaBiForMNTP for predictions without further training. generation_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 187/187 [00:00<00:00, 2.59MB/s] [INFO|configuration_utils.py:917] 2024-05-29 19:19:49,050 >> loading configuration file generation_config.json from cache at /home/changge/.cache/huggingface/hub/models--meta-llama--Met a-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/generation_config.json [INFO|configuration_utils.py:962] 2024-05-29 19:19:49,050 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128009 ], "max_length": 4096, "temperature": 0.6, "top_p": 0.9 } Model's Lora trainable parameters: trainable params: 41,943,040 || all params: 7,546,867,712 || trainable%: 0.5558 Running tokenizer on every text in dataset: 0%| | 0/4358 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-c66962c78cb5529c.arrow 05/29/2024 19:19:49 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-c66962c78cb5529c.arrow Running tokenizer on every text in dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4358/4358 [00:00<00:00, 23444.76 examples/s] Running tokenizer on every text in dataset: 0%| | 0/1801350 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-655adfdce63258ce.arrow 05/29/2024 19:19:49 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-655adfdce63258ce.arrow Running tokenizer on every text in dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1801350/1801350 [01:23<00:00, 21494.56 examples/s] Running tokenizer on every text in dataset: 0%| | 0/3760 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-06c641e9a21337eb.arrow 05/29/2024 19:21:13 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-06c641e9a21337eb.arrow Running tokenizer on every text in dataset: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3760/3760 [00:00<00:00, 22418.88 examples/s] Grouping texts in chunks of 512: 0%| | 0/4358 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-97b8af2be572f3da.arrow 05/29/2024 19:21:13 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-97b8af2be572f3da.arrow Grouping texts in chunks of 512: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4358/4358 [00:00<00:00, 16710.13 examples/s] Grouping texts in chunks of 512: 0%| | 0/1801350 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-91f1e93e8c2532e8.arrow 05/29/2024 19:21:14 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-91f1e93e8c2532e8.arrow Grouping texts in chunks of 512: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1801350/1801350 [01:47<00:00, 16736.33 examples/s] Grouping texts in chunks of 512: 0%| | 0/3760 [00:00<?, ? examples/s] Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625aee71d232d685c3/cache-c698bf63ec328b6b.arrow 05/29/2024 19:23:01 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/changge/.cache/huggingface/datasets/wikitext/wikitext-103-raw-v1/0.0.0/b08601e04326c79dfdd32d625 aee71d232d685c3/cache-c698bf63ec328b6b.arrow Grouping texts in chunks of 512: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3760/3760 [00:00<00:00, 16511.18 examples/s] /data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/utils/import_utils.py:533: FutureWarning:
main()
File "/data/changge/project/llm2vec/experiments/run_mntp.py", line 930, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/trainer.py", line 3238, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/trainer.py", line 3264, in compute_loss
outputs = model(inputs)
^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/parallel/parallel_apply.py", line 108, in parallel_apply
output.reraise()
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/_utils.py", line 705, in reraise
raise exception
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
output = module(*input, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
outputs = self.model(
^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/peft/peft_model.py", line 642, in forward
return self.get_base_model()(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/changge/software/anaconda/envs/ll2vec/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 940, in forward
causal_mask = self._update_causal_mask(
^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaBiModel._update_causal_mask() takes from 4 to 5 positional arguments but 6 were given
is_torch_tpu_available
is deprecated and will be remove d in 4.41.0. Please use theis_torch_xla_available
instead. warnings.warn( [INFO|trainer.py:2078] 2024-05-29 19:23:03,549 >> Running training [INFO|trainer.py:2079] 2024-05-29 19:23:03,549 >> Num examples = 237,180 [INFO|trainer.py:2080] 2024-05-29 19:23:03,549 >> Num Epochs = 3 [INFO|trainer.py:2081] 2024-05-29 19:23:03,549 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2083] 2024-05-29 19:23:03,549 >> Training with DataParallel so batch size has been adjusted to: 128 [INFO|trainer.py:2084] 2024-05-29 19:23:03,549 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:2085] 2024-05-29 19:23:03,549 >> Gradient Accumulation steps = 1 [INFO|trainer.py:2086] 2024-05-29 19:23:03,549 >> Total optimization steps = 5,559 [INFO|trainer.py:2087] 2024-05-29 19:23:03,552 >> Number of trainable parameters = 567,279,616 0%| | 0/5559 [00:00<?, ?it/s] [WARNING|logging.py:329] 2024-05-29 19:23:39,022 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
. Traceback (most recent call last): [3/321] File "/data/changge/project/llm2vec/experiments/run_mntp.py", line 982, in