McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
https://mcgill-nlp.github.io/llm2vec/
MIT License
1.31k stars 95 forks source link

Issue with mntp training for Llama 3.2 model #150

Open sandeep-krutrim opened 1 month ago

sandeep-krutrim commented 1 month ago

Hi,

I am trying to train Llama 3.2 models using LLM2VEC. I am getting the following error -

ValueError:rope_scalingmust be a dictionary with two fields,typeandfactor, got {'factor': 32.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

My train config looks like this -

{ "model_name_or_path": "meta-llama/Llama-3.2-1B-Instruct", "dataset_name": "wikitext", "dataset_config_name": "wikitext-103-raw-v1", "per_device_train_batch_size": 32, "per_device_eval_batch_size": 32, "gradient_accumulation_steps": 1, "do_train": true, "do_eval": true, "max_seq_length": 512, "mask_token_type": "blank", "data_collator_type": "default", "mlm_probability": 0.2, "overwrite_output_dir": true, "output_dir": "output/mntp/Meta-Llama-3.2-1B-Instruct", "evaluation_strategy": "steps", "eval_steps": 100, "save_steps": 200, "stop_after_n_steps": 1000, "lora_r": 16, "gradient_checkpointing": true, "torch_dtype": "bfloat16", "attn_implementation": "flash_attention_2" }