Closed YongLD closed 1 month ago
The bug have fix.
Base on the ML-Mamba-main/mlmamba/models/backbones/vision/dinosiglip_vit.py
, I change the code return torch.cat([dino_patches[0], siglip_patches[0]], dim=2)
to return torch.cat([dino_patches, siglip_patches], dim=2)
,
I apologize for the delay in responding. Your modification is indeed correct. I have also added comments in the code to reflect this change. The issue was caused by differences in library versions. Thank you for your understanding.
One more question, I saw that there is some additional datasets in the project, such as [LVIS-Instruct-4V]
and [LRV-Instruct]
, but I do not find it's metioned in the paper. Have you used these datasets on the stage of finetuning?
Buy the way, Code in scripts/pretrain.py
:
dist.init_process_group(backend='nccl')
would be better to be changed as following :
if not dist.is_initialized():
dist.init_process_group(backend='nccl')
One more question, I saw that there is some additional datasets in the project, such as
[LVIS-Instruct-4V]
and[LRV-Instruct]
, but I do not find it's metioned in the paper. Have you used these datasets on the stage of finetuning?
I only used LLaVA v1.5 for fine-tuning in my ML-Mamba project. Although the source code supports pre-training with LVIS-Instruct-4V and LRV-Instruct datasets, I did not utilize these datasets during the fine-tuning stage.
It seems that there is an issue with the visual module. I haven't made any changes to the code during this process. Could this bug be caused by the model update?