-
-
🎉The finetuning(VQA/OCR/Grounding/Video) for Qwen2-VL-Chat series models has been supported, please check the documentation below for details:
# English
https://github.com/modelscope/ms-swift/blob/m…
-
Is there a plan to incorporate image embeddings along with OCR and metadata-based retrieval? Utilizing the CLIP model from Candle to generate image embeddings could provide clearer context and improve…
-
There are multiple mentions of a multi modal sequence parallel system for inference which can be seamlessly integrated with HF transformers. However, I am not able to follow this through the codebase …
-
On the advanced search page, we render facets for the user to refine their search. These facet fields, unlike facet fields on basic search, allow for multi-select. When facets are limited (which they …
-
- Extension version: v0.99.2024101604 (pre-release)
- VSCode Version: 1.94.2
- OS: Windows 10
- Repository Clone Configuration (single repository/fork of an upstream repository): Multi-repo, multi…
-
GPT models are now multi-modal so would be nice if the cad file had a spot for a camera that could be connected. Same goes for the microphone.
-
Providing an interface to query the raw audio samples is a very useful feature for multi-modal research.
The topic has been discussed before, with feature support added here: https://github.com/Fa…
-
### Discussion
Llava仓库本身已经提供了非常优秀的微调脚本.
ms-swift多模态大模型微调框架集成了Llava的推理与微调, 并书写了最佳实践: https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/llava%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.…
-
### What is the feature?
### Description
The current implementation of `BaseModel` in mmengine assumes a single `inputs` parameter of type `torch.Tensor` in the `forward` method:
```python
def…