Open Bing-a-ling7 opened 1 week ago
Hi, thanks for the interest in our work. Can you provide the detailed setup, i.e., system & installed package, to reproduce the problem?
Hi, thanks for the interest in our work. Can you provide the detailed setup, i.e., system & installed package, to reproduce the problem?
Thank you.I fix it. It's because of the version of the transformers. I change it to 4.44.2
.
And there are another problem. When I run MODEL=facebook/opt-1.3b TASK=RTE EPOCH=5 MODE=random_masking LR=1e-2 MASKING_PROB=0.9999 LOCAL_HOST=0 SEED=0 bash run.sh
command. I got info:
2024-11-18 17:38:06,929 - INFO - true masking prob: 0.9993464558919272
2024-11-18 17:38:15,456 - INFO - Train set 0 has 1000 training samples, 500 dev samples, and 277 eval samples
2024-11-18 17:38:15,457 - INFO - Tokenizing training samples...
2024-11-18 17:38:18,603 - INFO - Done with 3.15s
2024-11-18 17:38:19,102 - WARNING - Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2024-11-18 17:38:19,310 - INFO - There are 0 training samples and 277 validation samples
It seems like no training at all. Could you explain why?
And another issue
Why does using your method to mask out 99.9999% of the weight matrix result in memory usage almost the same as that of full fine-tuning (FFT)? I would appreciate your response!
Thank you!
Hi, thanks for the questions.
Hi, thanks for the interest in our work. Can you provide the detailed setup, i.e., system & installed package, to reproduce the problem?
Thank you.I fix it. It's because of the version of the transformers. I change it to
4.44.2
. And there are another problem. When I runMODEL=facebook/opt-1.3b TASK=RTE EPOCH=5 MODE=random_masking LR=1e-2 MASKING_PROB=0.9999 LOCAL_HOST=0 SEED=0 bash run.sh
command. I got info:2024-11-18 17:38:06,929 - INFO - true masking prob: 0.9993464558919272 2024-11-18 17:38:15,456 - INFO - Train set 0 has 1000 training samples, 500 dev samples, and 277 eval samples 2024-11-18 17:38:15,457 - INFO - Tokenizing training samples... 2024-11-18 17:38:18,603 - INFO - Done with 3.15s 2024-11-18 17:38:19,102 - WARNING - Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 2024-11-18 17:38:19,310 - INFO - There are 0 training samples and 277 validation samples
It seems like no training at all. Could you explain why?
Can you try to change a dataset and see if this message shows again. You can also look at the training log and the test accuracy to see if the training actually happens. A possible explanation is that this line of logs is for test data instead of training data. I can find such a line of log when evaluating the model.
And another issue
Why does using your method to mask out 99.9999% of the weight matrix result in memory usage almost the same as that of full fine-tuning (FFT)? I would appreciate your response!
Thank you!
In my experiment results, the memory cost of Random Masking is far less than FT. Can you provide the detailed setup and memory log, such that I can reproduce it? Also, can you try different datasets and different models to see it happens for all setups. Thanks~
Hi, thanks for the questions.
Hi, thanks for the interest in our work. Can you provide the detailed setup, i.e., system & installed package, to reproduce the problem?
Thank you.I fix it. It's because of the version of the transformers. I change it to
4.44.2
. And there are another problem. When I runMODEL=facebook/opt-1.3b TASK=RTE EPOCH=5 MODE=random_masking LR=1e-2 MASKING_PROB=0.9999 LOCAL_HOST=0 SEED=0 bash run.sh
command. I got info:2024-11-18 17:38:06,929 - INFO - true masking prob: 0.9993464558919272 2024-11-18 17:38:15,456 - INFO - Train set 0 has 1000 training samples, 500 dev samples, and 277 eval samples 2024-11-18 17:38:15,457 - INFO - Tokenizing training samples... 2024-11-18 17:38:18,603 - INFO - Done with 3.15s 2024-11-18 17:38:19,102 - WARNING - Detected kernel version 4.19.24, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 2024-11-18 17:38:19,310 - INFO - There are 0 training samples and 277 validation samples
It seems like no training at all. Could you explain why?
Can you try to change a dataset and see if this message shows again. You can also look at the training log and the test accuracy to see if the training actually happens. A possible explanation is that this line of logs is for test data instead of training data. I can find such a line of log when evaluating the model.
And another issue Why does using your method to mask out 99.9999% of the weight matrix result in memory usage almost the same as that of full fine-tuning (FFT)? I would appreciate your response! Thank you!
In my experiment results, the memory cost of Random Masking is far less than FT. Can you provide the detailed setup and memory log, such that I can reproduce it? Also, can you try different datasets and different models to see it happens for all setups. Thanks~
Thank you for your response.I fix it by changing my model to llama and upgrade my transformers version to the latest version.
I have another question. In your work, your train the mask model and evaluate immediately after training. But I want to save the model after each epoch. But when I loaded from the checkpoint, it shows:
Some weights of the model checkpoint at /mnt/workspace/code/MyExp/output/finetune/best were not used when initializing LlamaForCausalLM: ['model.layers.10.mlp.down_proj.base_Linear.weight', 'model.layers.10.mlp.down_proj.col_indices', 'model.layers.10.mlp.down_proj.row_indices', 'model.layers.10.mlp.down_proj.row_offsets', 'model.layers.10.mlp.down_proj.tunable_weights', 'model.layers.10.mlp.gate_proj.base_Linear.weight', 'model.layers.10.mlp.gate_proj.col_indices', 'model.layers.10.mlp.gate_proj.row_indices', 'model.layers.10.mlp.gate_proj.row_offsets', 'model.layers.10.mlp.gate_proj.tunable_weights', 'model.layers.10.mlp.up_proj.base_Linear.weight', 'model.layers.10.mlp.up_proj.col_indices', 'model.layers.10.mlp.up_proj.row_indices', 'model.layers.10.mlp.up_proj.row_offsets', 'model.layers.10.mlp.up_proj.tunable_weights', 'model.layers.10.self_attn.k_proj.base_Linear.weight', 'model.layers.10.self_attn.k_proj.col_indices', 'model.layers.10.self_attn.k_proj.row_indices', 'model.layers.10.self_attn.k_proj.row_offsets', 'model.layers.10.self_attn.k_proj.tunable_weights', 'model.layers.10.self_attn.o_proj.base_Linear.weight', 'model.layers.10.self_attn.o_proj.col_indices', 'model.layers.10.self_attn.o_proj.row_indices', 'model.layers.10.self_attn.o_proj.row_offsets', 'model.layers.10.self_attn.o_proj.tunable_weights', 'model.layers.10.self_attn.q_proj.base_Linear.weight', 'model.layers.10.self_attn.q_proj.col_indices', 'model.layers.10.self_attn.q_proj.row_indices', 'model.layers.10.self_attn.q_proj.row_offsets', 'model.layers.10.self_attn.q_proj.tunable_weights', 'model.layers.10.self_attn.v_proj.base_Linear.weight', 'model.layers.10.self_attn.v_proj.col_indices', 'model.layers.10.self_attn.v_proj.row_indices', 'model.layers.10.self_attn.v_proj.row_offsets', 'model.layers.10.self_attn.v_proj.tunable_weights', 'model.layers.11.mlp.down_proj.base_Linear.weight', 'model.layers.11.mlp.down_proj.col_indices', 'model.layers.11.mlp.down_proj.row_indices', 'model.layers.11.mlp.down_proj.row_offsets', 'model.layers.11.mlp.down_proj.tunable_weights', 'model.layers.11.mlp.gate_proj.base_Linear.weight', 'model.layers.11.mlp.gate_proj.col_indices', 'model.layers.11.mlp.gate_proj.row_indices', 'model.layers.11.mlp.gate_proj.row_offsets', 'model.layers.11.mlp.gate_proj.tunable_weights', 'model.layers.11.mlp.up_proj.base_Linear.weight', 'model.layers.11.mlp.up_proj.col_indices', 'model.layers.11.mlp.up_proj.row_indices', 'model.layers.11.mlp.up_proj.row_offsets', 'model.layers.11.mlp.up_proj.tunable_weights', 'model.layers.11.self_attn.k_proj.base_Linear.weight', 'model.layers.11.self_attn.k_proj.col_indices', 'model.layers.11.self_attn.k_proj.row_indices', 'model.layers.11.self_attn.k_proj.row_offsets', 'model.layers.11.self_attn.k_proj.tunable_weights', 'model.layers.11.self_attn.o_proj.base_Linear.weight', 'model.layers.11.self_attn.o_proj.col_indices', 'model.layers.11.self_attn.o_proj.row_indices', 'model.layers.11.self_attn.o_proj.row_offsets', 'model.layers.11.self_attn.o_proj.tunable_weights',
...]
- This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Seems like not saving the architecture of the mask model. AND my saving method is here:
unwrapped_model = accelerator.unwrap_model(model)
unwrapped_model.save_pretrained(save_path, is_main_process=accelerator.is_main_process, save_function=accelerator.save, state_dict=accelerator.get_state_dict(model))
tokenizer.save_pretrained(save_path)
Can you help me with this?
Hi, how do you load the saved model? Besides, do you use a customized training pipeline?
Thanks for your excellent work! I encounter a bug when I run
MODEL=facebook/opt-1.3b TASK=RTE EPOCH=5 MODE=random_masking LR=1e-2 MASKING_PROB=0.9999 LOCAL_HOST=0 SEED=0 bash run.sh
could you help me with this problem?