WisdomShell / codeshell

A series of code large language models developed by PKU-KCL
http://se.pku.edu.cn/kcl
Other
1.61k stars 119 forks source link

微调时损失一直为11.15625 #68

Closed unfold8 closed 6 months ago

unfold8 commented 8 months ago

在使用codeshell微调时损失一直没有变化,一直是11.15625,这是为什么呢

参数设置:

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(
            MODEL_NAME,
            use_fast=True,
            trust_remote_code=True,
            model_max_length=512
        )
codeshell_lora_config = LoraConfig(
    # r=yaml_config["lora_config"]["r"],
    r=1,
    lora_alpha=yaml_config["lora_config"]["lora_alpha"],
    target_modules=["c_attn"],
    # lora_dropout=yaml_config["lora_config"]["lora_dropout"],
    lora_dropout=0.1,
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
)

运行结果: img_v3_026u_1375e20c-aa00-4b51-81d8-dfb5ebea331g

ruixie commented 7 months ago

你好,4bit量化模式下的微调暂不支持,请使用bf16进行微调