Open IanButterworth opened 3 years ago
That's awesome.
Just this morning I discovered via profiling that the biggest contributor to ImageMagick's slowness for small files is extracting the pixel depth, which is a single ccall
. Crazy. There might of course be a way around that, but I'm not in a rush; having an implementation that works and we can change is so important, and the benchmarks above are really exciting!
For the smallest images, https://github.com/JuliaIO/FileIO.jl/pull/295 will improve matters even further.
It's noteworthy that for TIFF, 10x10 and 100x100 are almost the same speed. That might merit some investigation, eventually.
With my recent JpegTurbo.jl development, I noticed that only benchmarking with randomly generated images can be quite misleading; many image compression tricks work only when there exist overlaps between meaningful blocks and patches. Thus I would suggest adding more test images of the same size and plotting the median result of those samples when we regenerate the graphs.
Absolutely. Note that PNGFiles.jl now has automated CI benchmarking set up https://github.com/JuliaIO/PNGFiles.jl/pull/52
i.e. see the report here that asserted there was no performance change in this PR https://github.com/JuliaIO/PNGFiles.jl/pull/51#issuecomment-1027653354
But that is currently using random images, and @drvi already suggested they should be replaced with test images.
Perhaps we should set the same thing up for ImageIO with TestImages vs. each backend
We just need to add a few high-resolution test images to TestImages.jl... Among those widely used test image dataset, I know there's DIV2k but it's licensed for academic purposes only, do you have any suggestions on where we can find such test images?
NASA?
It's been a goal of mine for awhile to add automated CI benchmarking to TiffImages
(ref https://github.com/tlnagy/TiffImages.jl/issues/53) but I'm super busy for the foreseeable future. Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?
But I agree with @johnnychen94 that it makes sense to use real images in addition to randomly-generated ones.
Performance benchmarks are used for two purposes: 1) test against other similar packages, which can usually be written in other languages, and 2) regression test
For benchmark CI such as https://github.com/JuliaIO/PNGFiles.jl/pull/52, it is used to track if PRs/releases are slowing things down.
For benchmark scripts like this issue, https://github.com/johnnychen94/JpegTurbo.jl/issues/15, and the one @timholy created in https://github.com/JuliaImages/image_benchmarks, it's used for advertising purposes to convince people that we're doing great stuff. Also to prepare for JuliaImages 1.0, we definitely need such benchmarks.
Does it make more sense for the benchmarks to live at the individual package level or here in ImageIO? Maybe both?
Unless we move all packages into one gigantic monorepo, benchmark CIs for regression tests should still be put together with the source codes.
On the other hand, I prefer to have the "benchmark against other frameworks" codes stay in one repo as @timholy already did. I haven't yet committed to https://github.com/JuliaImages/image_benchmarks because the codes there are not very extensible/flexible in the sense that it's not always easy to switch on/off certain cases. Thus if we keep adding more benchmark cases there, we'll soon reach a status that it takes too long to get the result of interest. This is quite similar to the DemoCards I made for https://juliaimages.org/stable/examples/; that we can easily create an ad-hoc version of benchmark/demo scripts that works at first, but it's always a pain to convince/guide others to contribute benchmark/demo cases using the ad-hoc undocumented framework.
Some discussion on this can be found in https://github.com/JuliaImages/Images.jl/discussions/947 and I also have a very draft experiment in https://github.com/johnnychen94/Workflows.jl/pull/1, but I certainly don't have enough time to finish it... Maybe we can propose this as this year's GSoC project by updating https://julialang.org/jsoc/gsoc/images/?
I'm supportive of changes to the architecture of image_benchmarks
. That said, in the long run I expect that image_benchmarks
will have a similar fate as Julia's own "microbenchmarks" (repo: https://github.com/JuliaLang/Microbenchmarks): people want them, lots of folks who have different favorite image-processing suites will request that we compare their favorite framework, but nobody wants to maintain them. Building many different languages' suites on a single machine is a major pain in the neck, and I have delayed doing this precisely because it's no fun. But for long-term growth it's important in our current phase. (I don't really expect to keep them going for 10 years, though; realistically I might imagine maintaining them for a couple of years.)
Consequently, anything that you want to live "forever" and be primarily focused on within-Julia performance I would put elsewhere. I'm happy to rename that repo if that would help, e.g., cross-suite-benchmarks
or something.
Some benchmarks for FileIO
save
andload
functions with the different Image IO backends, log x axis, because ImageMagick can be a lot slower.All defaults, no kwargs.
This is with FileIO https://github.com/JuliaIO/FileIO.jl/pull/290 ImageMagick
v0.7.6
QuartzImageIOv0.7.3
ImageIOv0.5.1
TiffImagesv0.2.2
PNGFilesv0.3.6
Benchmark code
cc. @tlnagy @timholy @drvi