llava-llama3-8b 微调过程中 loss nan

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Apache License 2.0

4k stars 314 forks source link

微调llava-llama3-8b的时候从几个step后就开始loss=nan了这个可能是什么原因呢？我看github issue也有人遇到类似问题官方回复是改lr 我现在设置的

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 32*4
dataloader_num_workers = 32
max_epochs = 1
optim_type = AdamW
lr = 2e-6

param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]

image_processor = dict( type=SiglipImageProcessor.from_pretrained, pretrained_model_name_or_path=visual_encoder_name_or_path, trust_remote_code=True) model = dict( type=LLaVAModel, freeze_llm=True, freeze_visual_encoder=True, llm=dict( type=AutoModelForCausalLM.from_pretrained, pretrained_model_name_or_path=llm_name_or_path, trust_remote_code=True), visual_encoder=dict( type=SiglipVisionModel.from_pretrained, pretrained_model_name_or_path=visual_encoder_name_or_path))

InternLM / xtuner

llava-llama3-8b 微调过程中 loss nan #942