-
Dear Author,
Thank you for your inspiring work. I noticed a problem that might be worth a recheck or your clarification.
In your implementation, you used the prompt " Please answer this question…
-
In tv shows, we may pay more attention to the faces instead of backgroud.
If we calculate vmaf over one frame, vmaf score may be raised by its backgroud even though the faces were compressed terribl…
-
Hello, thank you for your outstanding work!
You used a `PMC-VQA-Subset` with 2469 VQA pairs, could you please open source the subset?
Thank you very much!
lsnls updated
5 months ago
-
**Description:**
With latest builds of freerdp for F39 appear to have ffmpeg enabled. When RDP connections are attempted with gnome-connections, remmina or xfreerdp, the screen appears blank. The err…
-
```
The configure line used in current ubuntu and debian packages of ffmpeg-damnvid
makes the binary unredistributable:
./configure --enable-memalign-hack --enable-libxvid --enable-libx264
--enable…
-
Thanks for your great work! @FarinaMatteo
Can you provide the off-the-shelf pruned_weights and fine-tuned weights for vqa?
-
### Feature request / 功能建议
我想利用多模态版本做一些视频相关任务,是否有相关支持
### Motivation / 动机
从视频中抽取几帧图像,然后拼接文本,这种对于GLM-4V-9B可行吗,如果可行,有哪些需要注意的点(最大图片数量,推理输入格式等)
### Your contribution / 您的贡献
谢谢!
-
### Describe the issue
Issue: When I run multiple inference for some images, the result of first image is good but the following results for other image are poor. I use the python script from https:/…
-
## ❓ Questions and Help
您好,我使用bottom up attention(来源:https://github.com/airsplay/py-bottom-up-attention,我对它的理解是用faster rcnn在VG数据集上预训练)对整个coco2014数据集做测试,获得了gt-box和每个box对应的label,每个box用一个2048维的向量表示视觉特征,…
-
Hi, Thanks for your great work.
I try to reproduce your code and thus I would like to ask more details.
In your dataset, there are 3 stage in released dataset and paper description in 3.2.
Howeve…