Open lorenzoh opened 2 years ago
In my quest for ever-faster image data pipelines for training neural nets
It's glad to know that you're interested in this package 😆 Curious to ask, won't the JPEG compression artifacts make the training network harder? I thought we need some lossless compression format, e.g., HDF/PNG/QOI, to build a more robust pipeline.
for CTs that are based on UInt8s anyway (like RGB{N0f8}), maybe even a view will do.
I also feel this is doable, I left a TODO here for this option but didn't figure out how when I did the initial implementation. https://github.com/johnnychen94/JpegTurbo.jl/blob/33d53e772cb3c11c0f49082d6493b06bec6cbaea/src/decode.jl#L131-L135
Am I missing something when it comes to the transposing?
I guess it's mainly because Julia uses column-major order and libjpeg-turbo uses row-major order. Thus when you preallocate out
, the size actually has to be (width, height)
. See also the permute part at the end:
https://github.com/johnnychen94/JpegTurbo.jl/blob/33d53e772cb3c11c0f49082d6493b06bec6cbaea/src/decode.jl#L100-L104
Does this approach make sense?
Yes, the JpegTurbo.Buffer
idea sounds good to me. But I'm currently not available to do it in the coming semester; it might be one or two months later for me to handle this. If you want to put up a PR I'd be very glad to review and merge.
Curious to ask, won't the JPEG compression artifacts make the training network harder?
Many image datasets come in .jpg and that's enough information especially if they are stored in larger sizes which can still be read quickly with JpegTurbo.jl using preferred_size
. The most destructive thing is if you apply multiple resizes/affine transformations to the same image since the image quality is reduced every time. So it can actually help to not have to presize the dataset.
I also feel this is doable, I left a TODO here for this option but didn't figure out how when I did the initial implementation.
I'll see if I figure this out, but if one is crazy about performance, I guess JpegTurbo.Buffer
will be the way to go anyway.
If you want to put up a PR I'd be very glad to review and merge.
That was my plan! Just wanted to check if there are any stumbling blocks I missed.
In my quest for ever-faster image data pipelines for training neural nets, I've been playing around with the source code to figure out how to reduce allocations when decoding images. I'm writing here to see if my assumptions about how one could go about this are correct and to clear up some questions.
It seems there are two allocations made:
jpeg_decode
, a matrixout
is created as the Julia representation of the image data which is returned to the caller_jpeg_decode!
, aUInt8
-vectorbuf
is created which JpegTurbo writes toCopying some code from
jpeg_decode
, I've managed to make a method that takes in anout
of the correct size and type and uses that instead of allocating it. There are some segfaults when transposing is not handled correctly or the size and type of the buffer aren't correct, but I assume these can be fixed. In any case, removing this allocation cuts memory usage in half. I assume that something similar could be done for thebuf
allocation; or forCT
s that are based onUInt8
s anyway (likeRGB{N0f8}
), maybe even a view will do.As for the API to use buffered data loading, I was thinking it may be safest to have a
Buffer
struct that holds aout
andbuf
. Since one often wants to reuse this buffer to load images of differing sizes, these buffers could be grown to the largest encountered image size; and images smaller than the currentout
buffer returned as views. This could be used something like:Does this approach make sense? Is there a simpler way? Am I missing something when it comes to the transposing?