udrs commented 5 months ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Thank you for your greate job.

My configuration is: MODEL="openbmb/MiniCPM-Llama3-V-2_5" # or openbmb/MiniCPM-V-2

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

DATA="/home/ubuntu/MiniCPM-V/finetune/haha/xidian.json" EVAL_DATA="/home/ubuntu/MiniCPM-V/finetune/haha/xidian.json" LLM_TYPE="llama3" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --llm_type $LLM_TYPE \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 false \ --bf16_full_eval false \ --fp16 true \ --fp16_full_eval true \ --do_train \ --do_eval \ --tune_vision false \ --tune_llm false \ --use_lora true \ --lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj)" \ --model_max_length 128 \ --max_slice_nums 2 \ --scale_resolution 128 \ --max_steps 10000 \ --eval_steps 1000 \ --output_dir output/output_minicpmv2_lora \ --logging_dir output/output_minicpmv2_lora \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 1e-6 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing true \ --deepspeed ds_config_zero2.json \ --report_to "tensorboard" # wandb

Below is the bug log:(MiniCPM-V) ubuntu@10-60-22-207:~/MiniCPM-V/finetune$ vim finetune_lora.sh (MiniCPM-V) ubuntu@10-60-22-207:~/MiniCPM-V/finetune$ bash finetune_lora.sh [2024-06-06 03:47:52,725] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1 [WARNING] using untested triton version (2.1.0), only 1.0.0 is known to be compatible [2024-06-06 03:47:53,467] [INFO] [comm.py:637:init_distributed] cdb=None [2024-06-06 03:47:53,467] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl /home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [00:03<00:00, 2.24it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Currently using LoRA for fine-tuning the MiniCPM-V model. {'Total': 8564355312, 'Trainable': 116301824} llm_type=llama3 Loading data... max_steps is given, it will override any value given in num_train_epochs Using /home/ubuntu/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_adam... Time to load fused_adam op: 0.05588507652282715 seconds 0%| | 0/10000 [00:00<?, ?it/s]/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Traceback (most recent call last): File "/home/ubuntu/MiniCPM-V/finetune/finetune.py", line 333, in train() File "/home/ubuntu/MiniCPM-V/finetune/finetune.py", line 323, in train trainer.train() File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/MiniCPM-V/finetune/trainer.py", line 203, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/MiniCPM-V/finetune/trainer.py", line 28, in compute_loss outputs = self.model.base_model(data = inputs, use_cache=False) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward return self.model.forward(args, kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/b9f5fa87759ba195bb866de9ab50510a5fe91bad/modeling_minicpmv.py", line 164, in forward vllm_embedding, vision_hidden_states = self.get_vllm_embedding(data) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/b9f5fa87759ba195bb866de9ab50510a5fe91bad/modeling_minicpmv.py", line 156, in get_vllm_embedding cur_vllmemb.scatter(0, image_indices.view(-1, 1).repeat(1, cur_vllm_emb.shape[-1]), RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation. 0%| | 0/10000 [00:00<?, ?it/s] [2024-06-06 03:48:09,728] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 3339) of binary: /home/ubuntu/miniconda3/envs/MiniCPM-V/bin/python Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/MiniCPM-V/bin/torchrun", line 8, in sys.exit(main()) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(args, kwargs) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/distributed/run.py", line 806, in main run(args) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call** return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-06-06_03:48:09 host : 10-60-22-207 rank : 0 (local_rank: 0) exitcode : 1 (pid: 3339) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ ### 期望行为 | Expected Behavior Hope to finetune your wonderful model on my dataset. Best regards. ### 复现方法 | Steps To Reproduce _No response_ ### 运行环境 | Environment ```Markdown NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 Package Version ------------------------- ------------ absl-py 2.1.0 accelerate 0.30.1 addict 2.4.0 aiofiles 23.2.1 altair 5.3.0 annotated-types 0.7.0 anyio 4.4.0 attrs 23.2.0 blis 0.7.11 catalogue 2.0.10 certifi 2024.6.2 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.16.0 colorama 0.4.6 confection 0.1.5 contourpy 1.2.1 cycler 0.12.1 cymem 2.0.8 deepspeed 0.14.2 editdistance 0.6.2 einops 0.7.0 et-xmlfile 1.1.0 exceptiongroup 1.2.1 fairscale 0.4.0 fastapi 0.110.3 ffmpy 0.3.2 filelock 3.14.0 fonttools 4.53.0 fsspec 2024.6.0 gradio 4.26.0 gradio_client 0.15.1 grpcio 1.64.1 h11 0.14.0 hjson 3.1.0 httpcore 1.0.5 httpx 0.27.0 huggingface-hub 0.23.3 idna 3.7 importlib_resources 6.4.0 Jinja2 3.1.4 joblib 1.4.2 jsonlines 4.0.0 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 langcodes 3.4.0 language_data 1.2.0 lxml 5.2.2 marisa-trie 1.1.1 Markdown 3.6 markdown-it-py 3.0.0 markdown2 2.4.10 MarkupSafe 2.1.5 matplotlib 3.7.4 mdurl 0.1.2 more-itertools 10.1.0 mpmath 1.3.0 murmurhash 1.0.10 networkx 3.3 ninja 1.11.1.1 nltk 3.8.1 numpy 1.24.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.5.40 nvidia-nvtx-cu12 12.1.105 opencv-python-headless 4.5.5.64 openpyxl 3.1.2 orjson 3.10.3 packaging 23.2 pandas 2.2.2 peft 0.11.1 Pillow 10.1.0 pip 24.0 portalocker 2.8.2 preshed 3.0.9 protobuf 4.25.0 psutil 5.9.8 py-cpuinfo 9.0.0 pydantic 2.7.3 pydantic_core 2.18.4 pydub 0.25.1 Pygments 2.18.0 pynvml 11.5.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rich 13.7.1 rpds-py 0.18.1 ruff 0.4.8 sacrebleu 2.3.2 safetensors 0.4.3 seaborn 0.13.0 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 69.5.1 shellingham 1.5.4 shortuuid 1.0.11 six 1.16.0 smart-open 6.4.0 sniffio 1.3.1 socksio 1.0.0 spacy 3.7.2 spacy-legacy 3.0.12 spacy-loggers 1.0.5 srsly 2.4.8 starlette 0.37.2 sympy 1.12.1 tabulate 0.9.0 tensorboard 2.16.2 tensorboard-data-server 0.7.2 thinc 8.2.4 timm 0.9.10 tokenizers 0.19.1 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 torchvision 0.16.2 tqdm 4.66.1 transformers 4.40.0 triton 2.1.0 typer 0.9.4 typing_extensions 4.8.0 tzdata 2024.1 urllib3 2.2.1 uvicorn 0.24.0.post1 wasabi 1.1.3 weasel 0.3.4 websockets 11.0.3 Werkzeug 3.0.3 wheel 0.43.0 ``` ### 备注 | Anything else? I can run inference code.

qyc-98 commented 5 months ago

hi we will update the code tto solve the problem . You can change the code by yourself like this: https://github.com/qyc-98/MiniCPM-V/blob/main/finetune/finetune.py#L273

for-code0216 commented 5 months ago

遇到同样问题，按照上述修改仍未解决。

udrs commented 5 months ago

Thank you for your quick reply. I modify the code as your guidance, and met new issue(we set "scale_resolution" as 256, and our image size is 256,)： ils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [64,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [65,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [66,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [67,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [68,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [69,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [70,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [71,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [72,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [73,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [74,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [75,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [76,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [77,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [78,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [79,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [80,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [81,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [82,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [83,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [84,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [85,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [86,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [87,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [89,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [90,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [91,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [92,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [93,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [94,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. ../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [556,0,0], thread: [95,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "/home/ubuntu/MiniCPM-V/finetune/finetune.py", line 334, in train() File "/home/ubuntu/MiniCPM-V/finetune/finetune.py", line 324, in train trainer.train() File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/MiniCPM-V/finetune/trainer.py", line 203, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/MiniCPM-V/finetune/trainer.py", line 28, in compute_loss outputs = self.model.base_model(data = inputs, use_cache=False) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/ubuntu/miniconda3/envs/MiniCPM-V/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward return self.model.forward(args, kwargs) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/b9f5fa87759ba195bb866de9ab50510a5fe91bad/modeling_minicpmv.py", line 164, in forward vllm_embedding, vision_hidden_states = self.get_vllm_embedding(data) File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM-Llama3-V-2_5/b9f5fa87759ba195bb866de9ab50510a5fe91bad/modeling_minicpmv.py", line 156, in get_vllm_embedding cur_vllmemb.scatter(0, image_indices.view(-1, 1).repeat(1, cur_vllm_emb.shape[-1]), RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

0%| | 0/10000 [00:00<?, ?it/s]

Are there any problems in the json file below? [ { "id": "0", "image": "/home/ubuntu/MiniCPM-V/finetune/haha/xidian.jpg", "conversations": [ { "role": "user", "content": "\nWhat is this image showing?" }, { "role": "assistant", "content": "This image shows an aerial view of Xidian University(Guangzhou Institution)." }, { "role": "user", "content": "Can you describe the buildings in the image?" }, { "role": "assistant", "content": "The buildings in the image are part of Xidian University(Guangzhou Institution). The central tall building is flanked by several smaller buildings, all with a modern architectural style featuring red and beige tones." }, { "role": "user", "content": "What is the main building in the center?" }, { "role": "assistant", "content": "The main building in the center is the tallest structure in Xidian University(Guangzhou Institution), likely serving as the administrative or main academic building." }, { "role": "user", "content": "What kind of area surrounds the university?" }, { "role": "assistant", "content": "Xidian University(Guangzhou Institution) is surrounded by urban development with roads, other universities like Guangzhou caijing university, and green spaces visible in the background." }, { "role": "user", "content": "What is the significance of this university?" }, { "role": "assistant", "content": "Xidian University is a renowned institution in China, known for its strengths in electronics and information technology." } ] }, { "id": "1", "image": "/home/ubuntu/MiniCPM-V/finetune/haha/xidiangym.jpg", "conversations": [ { "role": "user", "content": "\nWhat is shown in this image?" }, { "role": "assistant", "content": "This image shows a sports field at Xidian University(Guangzhou), captured at night." }, { "role": "user", "content": "Can you describe the sports field in the image?" }, { "role": "assistant", "content": "The sports field is well-lit and has a green artificial turf with white markings. It is surrounded by a fence." }, { "role": "user", "content": "What type of sports can be played on this field?" }, { "role": "assistant", "content": "This field is suitable for playing soccer and other field sports." }, { "role": "user", "content": "What time of day is it in the image?" }, { "role": "assistant", "content": "The image appears to be taken at night, as the field is illuminated by floodlights." }, { "role": "user", "content": "What is the significance of this sports field?" }, { "role": "assistant", "content": "This sports field at Xidian University (Guangzhou) provides a space for students and staff to engage in physical activities and sports, promoting health and wellness." } ] } ]

udrs commented 5 months ago

issue solved, thank you for your help

udrs commented 5 months ago

exchange to A100

todaydeath commented 5 months ago

用的最新的代码，问题还在 ./aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3991,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed

huynhbaobk commented 5 months ago

exchange to A100

hi @udrs you mean you fix that by change to gpu A100?

huynhbaobk commented 5 months ago

用的最新的代码，问题还在 ./aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3991,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed

got same problem, even updata new code

lmiguel-aubay commented 3 months ago

用的最新的代码，问题还在 ./aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3991,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed

got same problem, even updata new code

I also have this issue with the most updated code, is there a way to solve it?

HadiAskari commented 3 months ago

Same problem. Did you guys figure it out?

OpenBMB / MiniCPM-V

Finetune bug #223

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

finetune.py FAILED