ImageIO Benchmarks - Githubissues

IanButterworth commented 3 years ago

Some benchmarks for FileIO save and load functions with the different Image IO backends, log x axis, because ImageMagick can be a lot slower.

All defaults, no kwargs.

This is with FileIO https://github.com/JuliaIO/FileIO.jl/pull/290 ImageMagick v0.7.6 QuartzImageIO v0.7.3 ImageIO v0.5.1 TiffImages v0.2.2 PNGFiles v0.3.6

Benchmark code


import Pkg
Pkg.develop(path=joinpath(@__DIR__,"..","FileIO.jl"))
Pkg.develop(path=joinpath(@__DIR__,"..","TiffImages.jl"))
Pkg.add.(["ImageCore", "BenchmarkTools", "DataFrames", "CSV"])

using FileIO, ImageCore, BenchmarkTools, DataFrames, CSV

res = DataFrame(backend=String[], file_fmt=String[], img_size=NTuple{2,Int}[], img_eltype=Type[],
                    save_first=Union{Missing,Float64}[], save=Float64[], load_first=Union{Missing,Float64}[], load=Float64[])
tmp, _ = mktemp()
for backend in ["ImageMagick", "QuartzImageIO", "ImageIO"]
    Pkg.add(backend)
    @info backend
    for ext in ["tiff","png"]
        fpath = string(tmp, ".", ext)
        for typ in [Gray{N0f8}, RGB{N0f8}]
            for n in 1:3
                println("$typ $(10^n)x$(10^n) ===")
                img = rand(typ, 10^n, 10^n)
                backend = backend
                file_fmt = ext
                img_size = size(img)
                img_eltype = typ
                save_first = if n == 1
                    @elapsed FileIO.save(fpath, img)
                else
                    missing
                end
                b = @benchmark FileIO.save($fpath, $img)
                save = median(b).time

                load_first = if n == 1
                    @elapsed img2 = FileIO.load(fpath)
                else
                    missing
                end
                b = @benchmark FileIO.load($fpath)
                load = median(b).time
                push!(res, (backend, file_fmt, img_size, img_eltype, save_first, save, load_first, load))
            end
        end
    end
    Pkg.rm(backend)
end
CSV.write("results.csv", res)

cc. @tlnagy @timholy @drvi

timholy commented 3 years ago

That's awesome.

Just this morning I discovered via profiling that the biggest contributor to ImageMagick's slowness for small files is extracting the pixel depth, which is a single ccall. Crazy. There might of course be a way around that, but I'm not in a rush; having an implementation that works and we can change is so important, and the benchmarks above are really exciting!

For the smallest images, https://github.com/JuliaIO/FileIO.jl/pull/295 will improve matters even further.

It's noteworthy that for TIFF, 10x10 and 100x100 are almost the same speed. That might merit some investigation, eventually.

johnnychen94 commented 2 years ago

With my recent JpegTurbo.jl development, I noticed that only benchmarking with randomly generated images can be quite misleading; many image compression tricks work only when there exist overlaps between meaningful blocks and patches. Thus I would suggest adding more test images of the same size and plotting the median result of those samples when we regenerate the graphs.

IanButterworth commented 2 years ago

Absolutely. Note that PNGFiles.jl now has automated CI benchmarking set up https://github.com/JuliaIO/PNGFiles.jl/pull/52

i.e. see the report here that asserted there was no performance change in this PR https://github.com/JuliaIO/PNGFiles.jl/pull/51#issuecomment-1027653354

But that is currently using random images, and @drvi already suggested they should be replaced with test images.

Perhaps we should set the same thing up for ImageIO with TestImages vs. each backend

johnnychen94 commented 2 years ago

We just need to add a few high-resolution test images to TestImages.jl... Among those widely used test image dataset, I know there's DIV2k but it's licensed for academic purposes only, do you have any suggestions on where we can find such test images?

IanButterworth commented 2 years ago

NASA?

tlnagy commented 2 years ago

It's been a goal of mine for awhile to add automated CI benchmarking to TiffImages (ref https://github.com/tlnagy/TiffImages.jl/issues/53) but I'm super busy for the foreseeable future. Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?

But I agree with @johnnychen94 that it makes sense to use real images in addition to randomly-generated ones.

johnnychen94 commented 2 years ago

Performance benchmarks are used for two purposes: 1) test against other similar packages, which can usually be written in other languages, and 2) regression test

For benchmark CI such as https://github.com/JuliaIO/PNGFiles.jl/pull/52, it is used to track if PRs/releases are slowing things down.

For benchmark scripts like this issue, https://github.com/johnnychen94/JpegTurbo.jl/issues/15, and the one @timholy created in https://github.com/JuliaImages/image_benchmarks, it's used for advertising purposes to convince people that we're doing great stuff. Also to prepare for JuliaImages 1.0, we definitely need such benchmarks.

Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?

Unless we move all packages into one gigantic monorepo, benchmark CIs for regression tests should still be put together with the source codes.

On the other hand, I prefer to have the "benchmark against other frameworks" codes stay in one repo as @timholy already did. I haven't yet committed to https://github.com/JuliaImages/image_benchmarks because the codes there are not very extensible/flexible in the sense that it's not always easy to switch on/off certain cases. Thus if we keep adding more benchmark cases there, we'll soon reach a status that it takes too long to get the result of interest. This is quite similar to the DemoCards I made for https://juliaimages.org/stable/examples/; that we can easily create an ad-hoc version of benchmark/demo scripts that works at first, but it's always a pain to convince/guide others to contribute benchmark/demo cases using the ad-hoc undocumented framework.

Some discussion on this can be found in https://github.com/JuliaImages/Images.jl/discussions/947 and I also have a very draft experiment in https://github.com/johnnychen94/Workflows.jl/pull/1, but I certainly don't have enough time to finish it... Maybe we can propose this as this year's GSoC project by updating https://julialang.org/jsoc/gsoc/images/?

timholy commented 2 years ago

I'm supportive of changes to the architecture of image_benchmarks. That said, in the long run I expect that image_benchmarks will have a similar fate as Julia's own "microbenchmarks" (repo: https://github.com/JuliaLang/Microbenchmarks): people want them, lots of folks who have different favorite image-processing suites will request that we compare their favorite framework, but nobody wants to maintain them. Building many different languages' suites on a single machine is a major pain in the neck, and I have delayed doing this precisely because it's no fun. But for long-term growth it's important in our current phase. (I don't really expect to keep them going for 10 years, though; realistically I might imagine maintaining them for a couple of years.)

Consequently, anything that you want to live "forever" and be primarily focused on within-Julia performance I would put elsewhere. I'm happy to rename that repo if that would help, e.g., cross-suite-benchmarks or something.

JuliaIO / ImageIO.jl

ImageIO Benchmarks #21