Closed chenruipu closed 3 weeks ago
i tried to use another platform without a CUDA or GPU to do finetune. However, there is another error like this:
dnabert2-cpu/lib/python3.8/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 314, in <module>
train()
File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 227, in train
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/hf_argparser.py", line 346, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 117, in __init__
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/training_args.py", line 1337, in __post_init__
raise ValueError(
ValueError: FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices.
If you want to finetune the model with CPU, please get rid of the --fp16
tag. This only applies to GPUs. We have never tested model fine-tuning on CPUs. So please share more here if you meet other type of errors.
warnings.warn("Can't initialize NVML")
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
WARNING:root:Perform single sequence classification...
<__main__.SupervisedDataset object at 0x7fc2e0271d00>
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
Some weights of the model checkpoint at /data5/chenruipu/data/wangchao/model/DNABERT-2-117M_model were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /data5/chenruipu/data/wangchao/model/DNABERT-2-117M_model and are newly initialized: ['classifier.weight', 'classifier.bias', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
***** Running training *****
Num examples = 46,499
Num Epochs = 5
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 4
Gradient Accumulation steps = 1
Total optimization steps = 58,125
Number of trainable parameters = 117,069,313
0%| | 0/58125 [00:00<?, ?it/s]/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py:433: UserWarning: Increasing alibi size from 512 to 1501
warnings.warn(
Traceback (most recent call last):
File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 314, in <module>
train()
File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 296, in train
trainer.train()
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
loss = self.compute_loss(model, inputs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
outputs = model(**inputs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 859, in forward
outputs = self.bert(
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 609, in forward
encoder_outputs = self.encoder(
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 447, in forward
hidden_states = layer_module(hidden_states,
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 328, in forward
attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 241, in forward
self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 182, in forward
attention = flash_attn_qkvpacked_func(qkv, bias)
File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/flash_attn_triton.py", line 1021, in forward
o, lse, ctx.softmax_scale = _flash_attn_forward(
File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/flash_attn_triton.py", line 781, in _flash_attn_forward
assert q.is_cuda and k.is_cuda and v.is_cuda
AssertionError
0%|
then i got more complex errors like this error
can you try pip uninstall triton
?
I also encountered a CUDA out of memory error, I am fine-tuning with 3xA100. I first tried using sample_data
for fine-tuning, and it works fine, then I switched to my own data for fine-tuning which raised the CUDA out of memory error (I also uninstalled the triton.)
Here is the code for fine-tuning:
cd finetune
export DATA_PATH=../data # e.g., ./sample_data
export MAX_LENGTH=128 # Please set the number as 0.25 * your sequence length.
# e.g., set it as 250 if your DNA sequences have 1000 nucleotide bases
# This is because the tokenized will reduce the sequence length by about 5 times
export LR=3e-5
# Training use DataParallel
python train.py \
--model_name_or_path zhihan1996/DNABERT-2-117M \
--data_path ${DATA_PATH} \
--kmer -1 \
--run_name DNABERT2_${DATA_PATH} \
--model_max_length ${MAX_LENGTH} \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--gradient_accumulation_steps 1 \
--learning_rate ${LR} \
--num_train_epochs 5 \
--fp16 \
--save_steps 200 \
--output_dir output/dnabert2 \
--evaluation_strategy steps \
--eval_steps 200 \
--warmup_steps 50 \
--logging_steps 100 \
--overwrite_output_dir True \
--log_level info \
--find_unused_parameters False
Here is the error I got:
File "train.py", line 303, in <module>
train()
File "train.py", line 285, in train
trainer.train()
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
return inner_training_loop(
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2019, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2300, in _maybe_log_save_evaluate
metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3029, in evaluate
output = eval_loop(
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3235, in evaluation_loop
preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 114, in nested_concat
return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 114, in <genexpr>
return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 116, in nested_concat
return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 75, in torch_pad_and_concatenate
return torch.cat((tensor1, tensor2), dim=0)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.09 GiB. GPU
^M 0%| | 200/758330 [06:32<413:24:47, 1.96s/it]
Is that because the dataset used for fine-tuning is too large (I have 3 million sequences for fine-tuning)?
The size of the dataset should not impact memory usage. Can you try to launch the experiment with distributed data parallel? Basically, you can achieve this by replacing python
with torchrun --npro_per_node=3
in your scripts.
thanks for your respon, now my finetune task can run correctly. But it seems to use only 1 cpu core for the task, which will take too much time (about 250 hours) to finish. i want to know whether i can do the finetune with multiple cpus?
Sorry, I have no idea and experience on multiple-cpu training. You may need to investigate this by yourself. Good luck!
i built a shellscript like this and i want to do finetune on CPU, but there is a error like this
i think i never tried to use a GPU or CUDA, so how to sovle the problems