-
Hello, I noticed that the input data used during training and in the [demo.py](https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/demo.py) consists of raw text and image tokens, without employing the c…
-
When building like this:
```
jetson-containers build llama-vision
```
```
-- L4T_VERSION=36.4.0
-- JETPACK_VERSION=6.1
-- CUDA_VERSION=12.6
-- PYTHON_VERSION=3.10
-- LSB_RELEASE=22.04 (ja…
-
Title.
VLM endpoint to support chatting/analyzing images.
Papers
https://arxiv.org/pdf/2407.06581
-
- While the Latex env is not fully set, we'll write our thoughts here for now
----
- `ViperGPT` is a framework that leverages the pre-trained vision language models (`GLIP` for image object ground…
-
Hello,
I am a PhD and interested in generating large-scale human-object interaction dataset for training a VLM. We have read your paper and found it really helpful for our goal. However, the released…
-
Both are modern performant models, and would be very useful for internal use due to their licenses.
https://huggingface.co/tiiuae/falcon-11B-vlm
renos updated
2 weeks ago
-
From GMD manuscript:
> The existing VLM modules assume a constant-rate trend into the future. While perhaps the best assumption that can be undertaken at a global scale, more refined approaches mig…
-
I have observed an artificial correlation between the sterodynamic and vertical land motion components in the FACTS output, which is visible in scatter plots. See an example for the tide gauge locatio…
-
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error m…
-
Hi,
I am thinking about using this data for finetuning a VLM to perform autonomous driving. However, I don't think I would need any images other than the front camera, due to system requirement res…