Other playable models-Text2Image

Wulx2050 commented 2 years ago

playable models

dalle-mini & craiyon https://github.com/borisdayma/dalle-mini
CogView2 https://github.com/THUDM/CogView2
待添加

No pretrained models

imagen https://github.com/lucidrains/imagen-pytorch
文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/
待添加

HighCWu commented 2 years ago

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

Wulx2050 commented 2 years ago

If we have enough time, we will try to migrate. However, I hope that Baidu official can release an open source model of text to image on paddlepaddle. I also know a popular model trained by Tsinghua University, although it is also a pytorch version. CogView2: https://github.com/THUDM/CogView2

我刚刚找了一下，文心 ERNIE-ViLG 文本生成图像的能力在开放领域公开数据集 MS-COCO 上进行了验证。评估指标使用 FID(该指标数值越低效果越好), 在 zero-shot 和 finetune 两种方式下，文心 ERNIE-ViLG 都取得了最佳成绩，效果远超 OpenAI 发布的 DALL-E 等模型。他们提供 ERNIE-ViLG API 体验调用的入口，也许你可以联系作者团队，找他们要预训练模型？

I just found it, and the ability of Wenxin ERNIE-ViLG to generate images from text is verified on the open domain public dataset MS-COCO. The evaluation index uses FID (the lower the value of the index, the better the effect). In both zero-shot and finetune methods, Wenxin ERNIE-ViLG has achieved the best results, and the effect is far superior to the models such as DALL-E released by OpenAI. They provide an entry to the ERNIE-ViLG API experience call, maybe you can contact the author team and ask them to pre-train the model?

文心 ERNIE-ViLG https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_vilg/ paper: https://arxiv.org/pdf/2112.15283.pdf

Wulx2050 commented 2 years ago

Another project with code and models

ERNIE-SAT 类别文心·跨模态大模型应用语音编辑、语音生成、语音克隆、带语音克隆的语音到语音翻译

ERNIE-SAT 采用语音-文本联合训练的方式在中文和英文数据集上进行预训练。使得模型学到了语音和文本的对齐关系，并且生成频谱的精度更高，合成声音的质量更高。

https://wenxin.baidu.com/wenxin/modelbasedetail/ernie_sat/

AgentMaker / ru-dalle-paddle

Other playable models-Text2Image #1