genmoai / mochi

The best OSS video generation models
Apache License 2.0
2.13k stars 213 forks source link

About implicit prompt refiner or negative prompt in playground. #65

Open DZY-irene opened 2 weeks ago

DZY-irene commented 2 weeks ago

Thank you so much for your contributions to the Text2Video open-source community! I used the same short prompt with the Mochi model through both the CLI demo and the playground, but I noticed a slight difference in the video quality. Could you let me know if the playground includes an implicit prompt refiner or any negative prompts?

Here are my inputs and outputs:

prompt: a fantasy landscape from cli:

https://github.com/user-attachments/assets/cc86e04a-42ff-48ab-bea6-8ee802cee6f9

from playground:

https://github.com/user-attachments/assets/52282e79-ed04-48cb-8b25-cd00069441ab

prompt: alley from cli:

https://github.com/user-attachments/assets/9565bfe5-c891-439d-b291-2bd22617049d

from playground:

https://github.com/user-attachments/assets/24807861-8354-46c7-9804-8902891f1a52

I wonder about the methods you use to enhance video quality. Looking forward to your response!

DZY-irene commented 2 weeks ago

I noticed that the currently open-source model outputs a video with resolution of 480x848, while the videos generated in the playground have a resolution of 960x1696. Does this mean that the quality difference I've noticed is due to different models, or is it simply because I didn't use a longer prompt or a negative prompt?

jpgallegoar commented 2 weeks ago

They are definitely using a different prompt from what the user says (probably sending it to a LLM to enhance it) and then upscaling the generated video from 480p to whatever they give you. What model are you using fp8, fp16?