ZCMax / LLaVA-3D

A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
159 stars 4 forks source link

error running demo #11

Closed RohanChacko closed 6 days ago

RohanChacko commented 6 days ago

Hi, I am trying to run the demo on two rtx 3090 gpus. I face the below error when running run_llava_3d.py

Traceback (most recent call last):
  File "./LLaVA-3D/llava/eval/run_llava_3d.py", line 208, in <module>
    eval_model(args)
  File "./LLaVA-3D/llava/eval/run_llava_3d.py", line 163, in eval_model
    output_ids = model.generate(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "./LLaVA-3D/llava/model/language_model/llava_llama.py", line 139, in generate
    self.prepare_inputs_labels_for_multimodal(
  File "./LLaVA-3D/llava/model/llava_arch.py", line 283, in prepare_inputs_labels_for_multimodal
    video_features_minibatch, batch_offset = self.encode_rgbd_videos(
  File "./LLaVA-3D/llava/model/llava_arch.py", line 217, in encode_rgbd_videos
    image_features = self.get_model().get_vision_tower()(images.flatten(0, 1))
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "./LLaVA-3D/llava/model/multimodal_encoder/clip_encoder.py", line 66, in forward
    image_forward_outs = self.vision_tower(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 917, in forward
    return self.vision_model(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 844, in forward
    encoder_outputs = self.encoder(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward
    layer_outputs = encoder_layer(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 196, in forward
    return F.layer_norm(
  File "./envs/llava-3d/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

The error comes when running image_features = self.get_model().get_vision_tower()(images.flatten(0, 1)) in llava/model/llava_arch.py.

ZCMax commented 6 days ago

Hello! you can specify the GPU to run the inference.sh such as CUDA_VISIBLE_DEVICES=0:

CUDA_VISIBLE_DEVICES=0 python llava/eval/run_llava_3d.py \
        --model-path ChaimZhu/LLaVA-3D-7B \
        --video-path playground/data/LLaVA-3D-Pretrain/scannet/scene0382_01 \
        --query "The related object is located at [-0.085,1.598,1.310,0.159,2.089,1.627,-2.097,0.026,0.1000]. What state is this object?"