AgentMaker / ru-dalle-paddle

Generate images from texts. In Russian. In PaddlePaddle
Apache License 2.0
23 stars 0 forks source link

Other playable models-Text2Image #1

Open Wulx2050 opened 2 years ago

Wulx2050 commented 2 years ago

playable models

  1. dalle-mini & craiyon https://github.com/borisdayma/dalle-mini

  2. CogView2 https://github.com/THUDM/CogView2

  3. 待添加


No pretrained models

  1. imagen https://github.com/lucidrains/imagen-pytorch

  2. 文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/

  3. 待添加

HighCWu commented 2 years ago

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

Wulx2050 commented 2 years ago

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

我刚刚找了一下,文心 ERNIE-ViLG 文本生成图像的能力在开放领域公开数据集 MS-COCO 上进行了验证。评估指标使用 FID(该指标数值越低效果越好), 在 zero-shot 和 finetune 两种方式下,文心 ERNIE-ViLG 都取得了最佳成绩,效果远超 OpenAI 发布的 DALL-E 等模型。他们提供 ERNIE-ViLG API 体验调用的入口,也许你可以联系作者团队,找他们要预训练模型?

I just found it, and the ability of Wenxin ERNIE-ViLG to generate images from text is verified on the open domain public dataset MS-COCO. The evaluation index uses FID (the lower the value of the index, the better the effect). In both zero-shot and finetune methods, Wenxin ERNIE-ViLG has achieved the best results, and the effect is far superior to the models such as DALL-E released by OpenAI. They provide an entry to the ERNIE-ViLG API experience call, maybe you can contact the author team and ask them to pre-train the model?

文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/ paper: https://arxiv.org/pdf/2112.15283.pdf

Wulx2050 commented 2 years ago

Another project with code and models

ERNIE-SAT 采用语音-文本联合训练的方式在中文和英文数据集上进行预训练。使得模型学到了语音和文本的对齐关系,并且生成频谱的精度更高,合成声音的质量更高。

https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_sat/