-
Hi, thanks for your contribution. I have one question about the training details.
In the **Figure2.left** of the paper, both of visual encoder and text encoder are **frozen** during training,meanin…
yzrs updated
6 months ago
-
No matter how many times or how many different ways I try to install it, I cannot get it working with the server software.
-
any tip on how to integrate it with the file previews in the program ranger? i like your program renders better then the default img2text
-
二维码结构简单,且就黑白2色 完全可以转换成纯ascii显示 终端可以直接扫描
建议1. img2txt.py
https://github.com/hit9/img2txt
此方式完全使用python的方式 是最方便的
img2txt.py --ansi xxxx.png
缺陷:代码实现尺寸限制的地方有bug,出来的二维码很大,但是质量很好
建议2. img2text 依赖ca…
-
#### Runtime Environment
- Operating system and version: Antergos Linux. (4.19.8-arch1-1)
- Terminal emulator and version: Termite/urxvt/st
- Python version: 3.7.1
- Ranger version/commit: range…
-
### ⚠️ 搜索是否存在类似issue
- [X] 我已经搜索过issues和disscussions,没有发现相似issue
### 总结
支持bot阅读图像,原理是使用blip2 (https://arxiv.org/pdf/2301.12597.pdf) 获得image caption,再使用了OCR提取了可能出现的文字,拼起来丢给bot
全部是本地运行。使用了bl…
-
跨模态模型几乎都会关注img2text或者text2img的效果,体现了模态对齐的能力强弱。但在做跨模态对齐的预训练后,请问大佬其在单模态的检索能力相比其他在imageNet上预训练的特征提取模型比如ResNet系列的如何呢?我自己简单尝试了一下,把跨模态预训练模型如ViT-B-16的图像塔拿出来做特征提取器,构建一个小型图片向量检索数据库,和vgg16比了一下,效果只是和vgg16差不多...
-
I have two questions here: i) looks like get_max_radius(bwimg,[(x1,y1),(x2,y2)]) is missing in your findArrows.py. ii) you are currently using the ocr_model = ocr_predictor(det_arch='db_resnet50', re…
-
I have one 16 GB NVIDIA RTX A4000 GPU. When I run " torchrun --nproc-per-node 1 train.py --cfg-path train_configs/minigpt4_stage2_finetune.yaml " command I got "failed to create process" error. Can an…
-
HuggingGPT: [Paper](https://arxiv.org/abs/2303.17580), [Code](https://github.com/microsoft/JARVIS)
Estimated time: 09/03/2023
[Slides (Karan)]( https://docs.google.com/presentation/d/1H2KAepo72ZqA…