BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。
BriVL论文:WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training
适用场景示例:图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。
模型 | 语言 | 参数量(单位:亿) | 文件(file) |
---|---|---|---|
BriVL-1.0 | 中文 | 10亿 | BriVL-1.0-5500w.tar |
# 环境要求
lmdb==0.99
timm==0.4.12
easydict==1.9
pandas==1.2.4
jsonlines==2.0.0
tqdm==4.60.0
torchvision==0.9.1
numpy==1.20.2
torch==1.8.1
transformers==4.5.1
msgpack_numpy==0.4.7.1
msgpack_python==0.5.6
Pillow==8.3.1
PyYAML==5.4.1
配置要求在requirements.txt中,可使用下面的命令:
pip install -r requirements.txt
从此处获取BriVL的Bounding Box提取工具BBox-extractor。
cd evaluation/
bash test_xyb.sh
现已放入3个图文对示例:
./data/imgs # 放入图像
./data/jsonls # 放入图文对描述
@article{DBLP:journals/corr/abs-2103-06561,
author = {Yuqi Huo and
Manli Zhang and
Guangzhen Liu and
Haoyu Lu and
Yizhao Gao and
Guoxing Yang and
Jingyuan Wen and
Heng Zhang and
Baogui Xu and
Weihao Zheng and
Zongzheng Xi and
Yueqian Yang and
Anwen Hu and
Jinming Zhao and
Ruichen Li and
Yida Zhao and
Liang Zhang and
Yuqing Song and
Xin Hong and
Wanqing Cui and
Dan Yang Hou and
Yingyan Li and
Junyi Li and
Peiyu Liu and
Zheng Gong and
Chuhao Jin and
Yuchong Sun and
Shizhe Chen and
Zhiwu Lu and
Zhicheng Dou and
Qin Jin and
Yanyan Lan and
Wayne Xin Zhao and
Ruihua Song and
Ji{-}Rong Wen},
title = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},
journal = {CoRR},
volume = {abs/2103.06561},
year = {2021},
url = {https://arxiv.org/abs/2103.06561},
archivePrefix = {arXiv},
eprint = {2103.06561},
timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}