chenhaoxing / DiffUTE

This repository is the code of our paper "DiffUTE: Universal Text Editing Diffusion Model" (NeurIPS'2023).
Apache License 2.0
121 stars 6 forks source link

文字区域无法恢复出文字显示为灰色 #6

Closed bjtuln closed 10 months ago

bjtuln commented 10 months ago

对您的工作很感兴趣按论文复现代码,用SD1.5作为baseline finetune但文字区域一直是灰色,不知道您在实验过程中有遇到这类问题吗? image 目前有几个不确定的点 (1)除了mse loss还有增加其他loss函数吗? (2)TROCR encoder输出维度是多大?

chenhaoxing commented 10 months ago

Hi。loss是用来训VAE的。训练完整模型时是只有原来扩散模型的loss。trocr可参考https://huggingface.co/microsoft/trocr-base-printed

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: bjtuln @.> 发送时间: Friday, January 12, 2024 12:23:38 AM 收件人: chenhaoxing/DiffUTE @.> 抄送: Subscribed @.***> 主题: [chenhaoxing/DiffUTE] 文字区域无法恢复出文字显示为灰色 (Issue #6)

对您的工作很感兴趣按论文复现代码,用SD1.5作为baseline finetune但文字区域一直是灰色,不知道您在实验过程中有遇到这类问题吗? image.png (view on web)https://github.com/chenhaoxing/DiffUTE/assets/6772059/3e4dd9c2-3d91-4028-bccc-8f72de1d257d 目前有几个不确定的点 (1)除了mse loss还有增加其他loss函数吗? (2)TROCR encoder输出维度是多大?

― Reply to this email directly, view it on GitHubhttps://github.com/chenhaoxing/DiffUTE/issues/6, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN6YSKRWWZIQJ6VF56QEK3DYOAGYVAVCNFSM6AAAAABBWYTA2SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3TOMBZGE3TQMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

bjtuln commented 10 months ago

为什么mask的文字区域一直是灰色噪声呢?

chenhaoxing commented 10 months ago

没有遇到过这种情况,你可以多训练一下看看。

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: bjtuln @.> 发送时间: Monday, January 15, 2024 3:33:32 PM 收件人: chenhaoxing/DiffUTE @.> 抄送: chx_ant @.>; Comment @.> 主题: Re: [chenhaoxing/DiffUTE] 文字区域无法恢复出文字显示为灰色 (Issue #6)

为什么mask的文字区域一直是灰色噪声呢?

― Reply to this email directly, view it on GitHubhttps://github.com/chenhaoxing/DiffUTE/issues/6#issuecomment-1891466415, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN6YSKWEVCGRQCQZTZ5S2XTYOTLUZAVCNFSM6AAAAABBWYTA2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJRGQ3DMNBRGU. You are receiving this because you commented.Message ID: @.***>

chenhaoxing commented 10 months ago

你好,我刚刚整理并开源了我们V1版本的代码,你可以参考。

prefixRAINSTARsuffix commented 9 months ago

@bjtuln 你好,能否向你请教一下你复现过程的细节。拼接 [latent_model_input, mask, masked_image_latents] 之后,需要将其经过一个卷积层调整维度吗 ?

bjtuln commented 8 months ago

@bjtuln 你好,能否向你请教一下你复现过程的细节。拼接 [latent_model_input, mask, masked_image_latents] 之后,需要将其经过一个卷积层调整维度吗 ?

不需要调整维度如果加载的预训练模型是stable-diffusion-inpainting默认就是12通道如果加载的是SD1.5可以通过 in_channels = 12 unet.register_to_config(in_channels=in_channels) 修改为12通道

tydia commented 8 months ago

@bjtuln 你好,请问你成功复现diffute了吗?我在57万数据上训了基于sd2的vae和diffute,vae训的时候每个crop size一个epoch,字重建效果提升很明显,但diffute训5个epoch发现比较短的印刷体能训出来,长一点的句子和背景复杂的字效果不好,看着像乱码,请问你有遇到过类似的问题吗?

chenhaoxing commented 8 months ago

你好,我们最近尝试将masked img中黑色部分替换为resize的glyph img可显著增强长文本编辑能力。你可以尝试下。


发件人: tong wang @.> 发送时间: Monday, March 4, 2024 11:31:54 PM 收件人: chenhaoxing/DiffUTE @.> 抄送: chx_ant @.>; State change @.> 主题: Re: [chenhaoxing/DiffUTE] 文字区域无法恢复出文字显示为灰色 (Issue #6)

@bjtulnhttps://github.com/bjtuln 你好,请问你成功复现diffute了吗?我在57万数据上训了基于sd2的vae和diffute,vae训的时候每个crop size一个epoch,字重建效果提升很明显,但diffute训5个epoch发现比较短的印刷体能训出来,长一点的句子和背景复杂的字效果不好,看着像乱码,请问你有遇到过类似的问题吗?

― Reply to this email directly, view it on GitHubhttps://github.com/chenhaoxing/DiffUTE/issues/6#issuecomment-1976851473, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN6YSKVLDLHRWVTZFSNXX43YWSHWVAVCNFSM6AAAAABBWYTA2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA2TCNBXGM. You are receiving this because you modified the open/close state.Message ID: @.***>

tydia commented 8 months ago

你好,我们最近尝试将masked img中黑色部分替换为resize的glyph img可显著增强长文本编辑能力。你可以尝试下。 ____ 发件人: tong wang @.> 发送时间: Monday, March 4, 2024 11:31:54 PM 收件人: chenhaoxing/DiffUTE @.> 抄送: chx_ant @.>; State change @.> 主题: Re: [chenhaoxing/DiffUTE] 文字区域无法恢复出文字显示为灰色 (Issue #6) @bjtulnhttps://github.com/bjtuln 你好,请问你成功复现diffute了吗?我在57万数据上训了基于sd2的vae和diffute,vae训的时候每个crop size一个epoch,字重建效果提升很明显,但diffute训5个epoch发现比较短的印刷体能训出来,长一点的句子和背景复杂的字效果不好,看着像乱码,请问你有遇到过类似的问题吗? ― Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN6YSKVLDLHRWVTZFSNXX43YWSHWVAVCNFSM6AAAAABBWYTA2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA2TCNBXGM. You are receiving this because you modified the open/close state.Message ID: @.***>

谢谢!

eafn commented 8 months ago

你好,我们最近尝试将masked img中黑色部分替换为resize的glyph img可显著增强长文本编辑能力。你可以尝试下。 ____ 发件人: tong wang @.> 发送时间: Monday, March 4, 2024 11:31:54 PM 收件人: chenhaoxing/DiffUTE @.> 抄送: chx_ant @.>; State change @.> 主题: Re: [chenhaoxing/DiffUTE] 文字区域无法恢复出文字显示为灰色 (Issue #6) @bjtulnhttps://github.com/bjtuln 你好,请问你成功复现diffute了吗?我在57万数据上训了基于sd2的vae和diffute,vae训的时候每个crop size一个epoch,字重建效果提升很明显,但diffute训5个epoch发现比较短的印刷体能训出来,长一点的句子和背景复杂的字效果不好,看着像乱码,请问你有遇到过类似的问题吗? ― Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AN6YSKVLDLHRWVTZFSNXX43YWSHWVAVCNFSM6AAAAABBWYTA2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZWHA2TCNBXGM. You are receiving this because you modified the open/close state.Message ID: @.***>

Hi! Could you explain how this Resize is done? Is it resizing the glyph image to the same size as the mask region in the mask image, aligning the text position with the mask region, or is it not necessary to correspond, allowing the glyph to fill the entire 512*512 image?