PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
MIT License
11.18k stars 996 forks source link

Tile inflation code #331

Open Jaeger416 opened 1 month ago

Jaeger416 commented 1 month ago

Hi, it seems that you haven't released the codes related to initialization from SD's 2D VAE. Would you like to or do you plan to release that part?

LinB203 commented 1 month ago

See here. It has re-implemented the function.

Jaeger416 commented 1 month ago

Thx, but as mentioned in the report, CausalVAE should retain the original encoding ability and can reconstruct images or videos without training. However, the temporal upsample block lead to meaningless video output

LinB203 commented 1 month ago

You can try to inference that it is still capable of outputting video

Jaeger416 commented 1 month ago

I tried but the output is strange. For example the TemporalDownSampleRes2x block, the output would change even zero initializes the convlution layer return alpha * self.avg_pool(x) + (1 - alpha) * self.conv(x)

LinB203 commented 1 month ago

I tried but the output is strange. For example the TemporalDownSampleRes2x block, the output would change even zero initializes the convlution layer return alpha * self.avg_pool(x) + (1 - alpha) * self.conv(x)

Could you post a video? I wonder that how strange it is.