Open nicolas-dufour opened 1 year ago
The latest commit on huggingface is for v3 (slightly different from paper).
If downloading using huggingface API, you'll need to specify revision. For example, to download v1 stage A (vqgan), use
checkpoint_path=hf_hub_download(repo_id="dome272/wuerstchen", filename="vqgan_f4_v1_500k.pt", revision="c9a8af033966c756941168f2a537595a15e0c1a8")
I believe the full v3 will be released soon, which would be better as a different stage b conditioning is used, fixing the variable resolution issue mentioned in the paper, so you might want to stick to the new version once it's released.
Oh thanks! Any info on what changed compared to the paper?
Oh thanks! Any info on what changed compared to the paper?
Based on what I heard in Eleutherai diffusion reading group, the following changes had been made:
Stage A: VQGAN is still used. However, quantization is removed, making it a pseudo VAE. Stage B: Uses LDM (Unet) instead of pealla. Instead of cross attention to inject conditioning, concat is used. Other changes: Different aspect ratio training is used As a result, V3 doesn't have issue with decoding at different resolution (as mentioned by paper in section 5 Discussion)
However, it's highly recommended to wait for the release notes. My notes might be as inaccurate or outdated. It's possible I misunderstood certain things or missed important changes.
Hi, The only checkpoint available on huggingface is stage c. Where can we find the other stages?
Thanks