chuanyangjin / fast-DiT

Fast Diffusion Models with Transformers
Other
734 stars 94 forks source link

inefficient data loader #12

Open eminorhan opened 6 months ago

eminorhan commented 6 months ago

I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate .npy file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it in uint8 in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.

wangyanhui666 commented 1 month ago

Have you tried if using this kind of uint8 data would lead to performance degradation?

eminorhan commented 1 month ago

No, I haven't tried it myself yet, but given that the compression seems to be near-lossless and given the qualitative reconstruction results, I would not expect any noticeable performance degradation. Note that uint8 would only be used for the input, the rest of the model would still use bf16/fp32, so it should not lead to any training stability issues.