Closed liangwq closed 3 months ago
I think you have to use a benchmark to eval these two models.
![]()
I think you have to use a benchmark to eval these two models.
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
I can't understand the consistency. In the paper, same prompt can't get same result.
The images generated by this version seem not to adhere to instructions as well as the previous alpha version did, and the quality of the generated images also appears to be inferior to that of the previous version. Why is this the case? Wasn't this version of the model expanded in terms of parameters or trained with an extended dataset based on the previous version?
I think you have to read this paper at first. It's different from the embedding length, captionner method , etc.
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
I can't understand the consistency. In the paper, same prompt can't get same result.
Yes, you see the difference between the effects given in the paper and the actual test results. I think the effects mentioned in the paper are acceptable, but how big is the difference under the same prompt in actual tests?
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
It would be better to use the same initial noise for the two models. Besides, the sigma model is finetuned on the alpha model with the changed dataset. It would be the main reason for the two model have differeces.
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
It would be better to use the same initial noise for the two models. Besides, the sigma model is finetuned on the alpha model with the changed dataset. It would be the main reason for the two model have differeces.
Today, the DMD model wil be released ?
1024 ckpt today. DMD then.
1024 ckpt today. DMD then.
thx.
This is because I used the same prompt for generating effects. The two models should be able to maintain consistency in following the same instructions; the new model has enhanced certain capabilities. Otherwise, the model lacks continuity.
It would be better to use the same initial noise for the two models. Besides, the sigma model is finetuned on the alpha model with the changed dataset. It would be the main reason for the two model have differeces.
Do you mean that if I keep the seed consistent, the similarity between the two effects will be higher? Or how should I keep the noise consistent? Currently, the sigma model automatically layouts the graphic size through text recognition, and this aspect of size cannot be manually controlled. What I mean is, how did you achieve consistent prompt effects in your paper? Have you validated it on more different style prompts, or did you randomly select some prompts as golden samples for testing before release?
During our test, we use the diffusers version(coming soon) which may be slightly different from the inference code in this repo. But we choosed the same noise as input here for testing: https://github.com/PixArt-alpha/PixArt-sigma/blob/592d4650ed5ad5f4efbc24376181cd519a9fa5b2/scripts/interface.py#L115
Besides, if you are familiar with the diffusion model' s training, the results may just be different as the training going on between different epoches. Or do you know any method that can help to make the model performs similar content but higher quality during training? I would like to give it a try.
During our test, we use the diffusers version(coming soon) which may be slightly different from the inference code in this repo. But we choosed the same noise as input here for testing:
Besides, if you are familiar with the diffusion model' s training, the results may just be different as the training going on between different epoches. Or do you know any method that can help to make the model performs similar content but higher quality during training? I would like to give it a try.
Okay, thank you. Regarding what you mentioned about maintaining consistency across updated versions of the model, I do have some thoughts on this. I've indeed given this some consideration, but it's just an idea. If it's not advisable, you might still want to see if it's feasible:
Generally speaking, the reasons for inconsistency seem to be threefold:
Since we hope to maintain stability through iterations during training, and ensure that later versions are better than earlier ones, we indeed need to ensure that the conditional latent is as consistent as possible between versions, or at least consistent in a macro sense. If we aim to optimize the description details and fine-grained alignment, then perhaps we could change the wording and detail description methods to teach the model to express details (for example, ensuring the text is compressed into a space distribution that is as consistent as possible).
For the overall text-image pairs that do not meet expectations, we could have the model correct them to the correct representation in the new version.
That is, the iteration of subsequent models should be doing baseline model sft, rlhf alignment. If secondary pre-training is indeed necessary, it should also only focus on learning from images that were poorly represented.
Would you pls DM me in the discord community so that we can have a further discussion? I'm curious about the idea.
Would you pls DM me in the discord community so that we can have a further discussion? I'm curious about the idea.
Can you give me your Discord ID? How can I contact you?
The images generated by this version seem not to adhere to instructions as well as the previous alpha version did, and the quality of the generated images also appears to be inferior to that of the previous version. Why is this the case? Wasn't this version of the model expanded in terms of parameters or trained with an extended dataset based on the previous version?