Closed greatewei closed 1 year ago
我的指令数据: 69k 基座 llama-13b: 微调参数:
MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 BATCH_SIZE = 64 MAX_STEPS = None GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 3 # we don't always need 3 tbh LEARNING_RATE = 3e-4 # the Karpathy constant CUTOFF_LEN = 2048 # 256 accounts for about 96% of the data LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = args.test_size # 2000 TARGET_MODULES = [ "q_proj", "v_proj", "k_proj", "o_proj", "down_proj", "gate_proj", "up_proj", ]
启动可能将近5分钟、推荐过程中总是卡在最后的输出,过一会才能输出最后的结果。
我的指令数据: 69k 基座 llama-13b: 微调参数:
启动可能将近5分钟、推荐过程中总是卡在最后的输出,过一会才能输出最后的结果。