Questions on training data

THUDM / Inf-DiT

Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

Apache License 2.0

306 stars 12 forks source link

Questions on training data #8

Closed adithyaiyer1999 closed 1 month ago

adithyaiyer1999 commented 1 month ago

Hi!

Thanks for your great work. I had 2 questions regarding the dataset you trained on.

Would you be releasing the exact post filtration data the final model was trained on?
Could you give me a rough estimate of how many images were present in your final training dataset?

Thanks again! Adi

yzy-thu commented 1 month ago

The training dataset is approximately 145M images with resolution higher than 1024, of which 140M come from laion-highresolution and can be directly accessed from Hugging Face. The remaining part has no plan to release, but I believe this part is not important for training.

adithyaiyer-morphic commented 1 month ago

Thanks! Makes sense.