-
To save GPU memory, I want to load the multilingual model in 4bit mode, the code is as follows.
```python
import torch
from transformers import AutoTokenizer
from mplug_owl.modeling_mplug_owl impo…
-
### System Info
- CPU: x86_64
- CPU mem: 64GB
- GPU name: V100 SXM2 16GB and Tesla T4 15GB both happens
- Libraries
- TensorRT-LLM commit https://github.com/NVIDIA/TensorRT-LLM/tree/3c46c2794e7f6df48…
-
### Model description
Align Before Fuse (ALBEF) is a vision-language (VL) model that showed competitive results in numerous VL tasks such as image-text retrieval, visual question answering, visual …
-
运行时环境是A100
安装的是pytorch版本是1.12 cuda版本是11.2
执行代码里面的python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_…
-
Hello, I am trying to find the training code, but it seems like there is just inference code.
Can you please point to the training code?
-
hi
when you compute the FLOPS in table 6 for baseline models such as ViLBERT, do you also include the FLOPS computation of feature extraction models?
-
Hi!
Let's bring the documentation to all the Spanish-speaking community 🌐
Who would want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/transformers/blob/m…
-
## タイトル: ピクセルから文章へ:リモートセンシングのためのマルチモーダル言語モデルの進歩
## リンク: https://arxiv.org/abs/2411.05826
## 概要:
リモートセンシングは、単純な画像取得から、視覚データとテキストデータを統合・処理できる複雑なシステムへと進化しました。本レビューでは、リモートセンシングにおけるマルチモーダル言語モデル(MLLM)の開…
-
I would like to request support to convert the blip-2 model for onnx conversion.
I have tried to convert the model using torch.onnx.export method but there are issues as the input to the forward me…
-
### Model description
# Escaping the Big Data Paradigm with Compact Transformers
Abstract :
> With the rise of Transformers as the standard for language processing, and their advancements in …