Open loretoparisi opened 1 week ago
Because this model does not exist, we implemented it in text form, with 16 channels. In the future, we will support input with 32 channels
The CogVideoX transformer in diffusers expects latents in the shape [B, F, C, H, W]
. The latents
parameter is already supported and I've tested it to work.
Feature request / 功能建议
I'm trying to add
latents
to the pipeline from encoded frames via the vae encoder example:
but I'm facing a dimensionality error
Motivation / 动机
add support to
latents
parameter in the CogVideoXPipeline pipelineYour contribution / 您的贡献
Tested VAE Image encoding/decoding https://github.com/THUDM/CogVideo/issues/249