Data source? - Githubissues

Hey! Copying my answer from a previous issue - "The data - this is a big one - the full GRIT data might contain a lot of low quality images and/or prompts. Most of the data I used was either synthetic or filtered by CLIP aesthetic score. Try the mj_latents.npy and mj_text_emb.npy from here https://huggingface.co/apapiu/small_ldt/tree/main - this is higher quality synthetic data - I think about 600k examples if I remember correctly." Or you can use the data processing to download any data that has image and caption pairs from huggingface.

apapiu / transformer_latent_diffusion

Data source? #20