Open eminorhan opened 6 months ago
Have you tried if using this kind of uint8 data would lead to performance degradation?
No, I haven't tried it myself yet, but given that the compression seems to be near-lossless and given the qualitative reconstruction results, I would not expect any noticeable performance degradation. Note that uint8
would only be used for the input, the rest of the model would still use bf16
/fp32
, so it should not lead to any training stability issues.
I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate
.npy
file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it inuint8
in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.