InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
https://xtuner.readthedocs.io/zh-cn/latest/
Apache License 2.0
4k stars 314 forks source link

llava-llama3-8b 微调过程中 loss nan #942

Open liboaccn opened 1 month ago

liboaccn commented 1 month ago
截屏2024-10-06 14 22 48

微调llava-llama3-8b的时候 从几个step后就开始loss=nan了 这个可能是什么原因呢?我看github issue也有人遇到类似问题 官方回复是改lr 我现在设置的

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 32*4
dataloader_num_workers = 32
max_epochs = 1
optim_type = AdamW
lr = 2e-6

param_scheduler = [
    dict(
        type=LinearLR,
        start_factor=1e-5,
        by_epoch=True,
        begin=0,
        end=warmup_ratio * max_epochs,
        convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
        begin=warmup_ratio * max_epochs,
        end=max_epochs,
        convert_to_iter_based=True)
]
liboaccn commented 1 month ago

补充,修改过 clip->siglip


image_processor = dict(
    type=SiglipImageProcessor.from_pretrained,
    pretrained_model_name_or_path=visual_encoder_name_or_path,
    trust_remote_code=True)

model = dict(
    type=LLaVAModel,
    freeze_llm=True,
    freeze_visual_encoder=True,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=llm_name_or_path,
        trust_remote_code=True),
    visual_encoder=dict(
        type=SiglipVisionModel.from_pretrained,
        pretrained_model_name_or_path=visual_encoder_name_or_path))