update finetune bug - Githubissues

I tried the last commit code and I continue to have the following error:

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [4870,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [4870,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [4870,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [4870,0,0], thread: [63,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "/workspace/finetune.py", line 333, in train() File "/workspace/finetune.py", line 323, in train trainer.train() File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1912, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2248, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/workspace/trainer.py", line 206, in training_step loss = self.compute_loss(model, inputs) File "/workspace/trainer.py", line 30, in compute_loss outputs = self.model.base_model(data = inputs, use_cache=False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 179, in forward return self.model.forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/21b10cdb728c15a5aa7c616732f049927aab1af3/modeling_minicpmv.py", line 169, in forward return self.llm( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1162, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 932, in forward cache_position = torch.arange( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This was also mention in a closed issue: https://github.com/OpenBMB/MiniCPM-V/issues/223 and https://github.com/OpenBMB/MiniCPM-V/issues/169

OpenBMB / MiniCPM-V

update finetune bug #224