FluxML / FastAI.jl

Repository of best practices for deep learning in Julia, inspired by fastai
https://fluxml.ai/FastAI.jl
MIT License
589 stars 51 forks source link

Use JpegTurbo.jl to load .jpg images #216

Closed lorenzoh closed 2 years ago

lorenzoh commented 2 years ago

JpegTurbo.jl can speed up image loading quite a bit, and the previous ImageIO.jl versions had severe performance regressions for 1.7, which should be fixed if JpegTurbo.jl is used.

lorenzoh commented 2 years ago

@johnnychen94 Will JpegTurbo.jl be used automatically if people use up-to-date FileIO.jl versions or do I need to change something explicitly? Would like users to avoid the performance regression introduced in 1.7 of the old jpg loader.

johnnychen94 commented 2 years ago

TL;DR; add ImageIO to Project.toml without using ImageIO in src/FastAI.jl. https://github.com/FluxML/FastAI.jl/pull/219 is sufficient for this purpose.


Let me try to explain the FileIO strategy a bit. FileIO maintains a registry for various IO backends. For instance, there are four backends supporting JPEG format, i.e.,:

# Among the JPEG backends, JpegTurbo has the highest priority, QuartzImageIO is only available on macOS
add_format(
    format"JPEG",
    UInt8[0xff,0xd8,0xff],
    [".jpeg", ".jpg", ".JPG"],
    [idJpegTurbo],
    [idImageIO],
    [idQuartzImageIO, OSX],
    [idImageMagick]
) # 0xe1

Thus, when you try to do load("file.jpg"), FileIO will try to do using JpegTurbo and call JpegTurbo's load function. If this fails, then try the next one until it succeeds.

ImageIO contains no real codes, it borrows the functionality from JpegTurbo.jl and other backends. We (JuliaImages) aim to make ImageIO the default IO backend and we want to tell our users a simple answer for all similar questions, that is:

add ImageIO "somewhere", use FileIO's load/save. By adding ImageIO in FastAI's Project.toml file, you can ensure that every user of FastAI has ImageIO installed "somewhere".

the previous ImageIO.jl versions had severe performance regressions for 1.7

This is a bit incorrect. Before I wrote the JpegTurbo.jl, ImageMagick was the only IO backend for the JPEG format, so when you saw load("file.jpg") getting slow by a significant ratio, it's because ImageMagick is getting slow, which is now fixed in Julia 1.8.

lorenzoh commented 2 years ago

Thanks for the fix and the explanation!