the wraparound behavior of using N0f8 seems unwanted?

johnnychen94 commented 6 years ago

For example, we have a new denoising algorithm imgdenoise_new(img) and want to compare the performance. A subtraction is often used to get the difference.

img_new = imgdenoise_new(img)
img_old = imgdenoise_old(img)

img_diff = img_new - img_old

For example, I want to get the img noise, which should be very small...

ori_img = fill(N0f8(0.5),2,2)
noisy_img = N0f8.(ori_img + randn(2,2)/100)
img_noise = noisy_img - ori_img

Since FixedPointNumbers use the same wraparound behavior for overflow as Julia does, the img_diff and img_noise are not actually what we want; some minor differences will become very large. Indeed, truncation behavior seems reasonable for image processing tasks.

Evizero commented 6 years ago

That is expected behaviour. N0f8 is basically a reinterpretation of a single byte to the range [0, 1]. It can't represent any numbers outside that.

If you want to have more flexibility use floating point number. They can also use numbers outside the color domain

julia> c = RGB{N0f8}(0.1, 0.2, 0.3)
RGB{N0f8}(0.102,0.2,0.298)

julia> c - c - c
RGB{N0f8}(0.902,0.804,0.706)

julia> c2 = RGB{Float32}(0.1, 0.2, 0.3)
RGB{Float32}(0.1f0,0.2f0,0.3f0)

julia> c2 - c2 - c2
RGB{Float32}(-0.1f0,-0.2f0,-0.3f0)

Evizero commented 6 years ago

Concerning truncation. N0f8 is not an image-processing specific type. Its a fixed-point number defined in FixedPointNumbers. I suspect there are good reasons for the wrap around behaviour that have probably to do with performance.

julia> N0f8(0.1) - N0f8(0.2)
0.906N0f8

Either way, the right place to propose/discuss a change to that behaviour is https://github.com/JuliaMath/FixedPointNumbers.jl

timholy commented 6 years ago

@johnnychen94 if you're used to Matlab then you probably expect saturating arithmetic. https://docs.julialang.org/en/stable/manual/faq/index.html#Why-does-Julia-use-native-machine-integer-arithmetic?-1

timholy commented 6 years ago

Wouldn't be crazy to define a FixedPoint overflow-checking type, however; it would be quite handy for debugging purposes.

johnnychen94 commented 6 years ago

@Evizero Many thanks. So the solution for this is to write img = Gray{Float32}.(load("img_path")) for all image loadings.

I create this issue here because the default loading behavior is kind of annoying, and I'm not sure if this is an image-processing specific problem or not.

I mean, I personally prefer using Float type instead of FixedPointNumbers to avoid unwanted bug caused by this wraparound behavior, and just don't care much about the accuracies brought N0f8.

timholy commented 6 years ago

First, slightly simpler is float32.(load("img_path")).

Second, in the Matlab world many images are loaded in UInt8 format, and you get similar problems there (problems due to saturating arithmetic rather than overflowing arithmetic).

Third, rather than converting whole images you could use temporary variables that make arithmetic safe. ssd and similar functions in Images.jl do this. (https://github.com/JuliaImages/Images.jl/blob/f6145afef096540aca323a128d5d4b3ffe894542/src/algorithms.jl#L276 and https://github.com/JuliaImages/Images.jl/blob/f6145afef096540aca323a128d5d4b3ffe894542/src/algorithms.jl#L291-L302)

Fourth and most importantly, we could change this, but the naive approach has a dreadful cost. Let me explain. The main reason to use N0f8/UInt8 is memory---N0f8 is just 1 byte, whereas Float32 is 4. Now, if all you're doing is loading cameraman then you don't care. However, if you're loading images (or image collections) via mmap (e.g., https://github.com/JuliaIO/NRRD.jl/blob/7a4f97e83c89ccece2ecbea320fc3cb28a3d9449/src/NRRD.jl#L204-L209), then in fact you have no choice except to use the representation as it is stored on disk. mmap is used by people who analyze data from scientific cameras (which produce ~4TB/hr these days), and perhaps (I honestly don't know) anyone who is training some algorithm on a large database of images of all the same size and type (e.g., some huge face database stored as a single file). Given this fact, there are three potential routes:

Convert all images to Float32 and make Julia really bad for processing large images or image collections (you'd need ~8TB of RAM for a 1hr recording with a scientific camera)
Convert small images to Float32 but keep large images in N0f8 or N0f16 format
Present all images in their native storage format.

We've gone with the last of these, which I view as the least-bad of all options. IMO, the middle one is the worst of them all. It has several problems, but the most serious is that we would evolve a class of bugs that only strike users of large images, and therefore are not easily discoverable by a test suite. This is the "dreadful cost" to which I was referring above.

Now, despite these considerations i wonder if we should harness your experience (and you are not alone) and see what we can do to improve the situation. I see two interesting ways forward:

Create a CheckedNormed type (CN0f8 and similar) that checks for overflow in all arithmetic operations. imgc = reinterpret(CN0f8, img) would then give you an object that would quickly discover any problematic algorithms. We could do this and try running the entire Images test suite. (An alternative is to allow one to configure things so that our current Normed types always check...that might be easier and more comprehensive.)
Julia offers a really interesting alternative to float32.(img): imgc = mappedarray(float32, n0f8, img). This will convert all values to Float32 on the fly, and if you try to store values in imgc it will convert back for its internal representation. If the value you're trying to store isn't representable as an N0f8 then it will throw an error (which is good). Or rather than using n0f8 you could truncate and then convert, in which case you get saturating arithmetic (but all the saturation happens at the end).

Either of these will, of course, be slower than if you convert the image itself. So they might be best as a check to see whether the JuliaImages algorithms are sufficiently careful.

Is anyone interested in either of these? I can look at setting up the raw numerics of this if folks offer to help fix any JuliaImages bugs that this discovers :smile:.

johnnychen94 commented 6 years ago

Thanks for your thoroughly explanation on the design! It's basically the trade-off between engineering efficiency and experimental flexibility.

For my own purpose, I just want to make sure my idea and code work as expected without caring the potential overflow things, in which case I'm very likely to conclude that my idea doesn't work at all, which is a disaster for me. One need to do thing correctly at first and then consider doing it better, so I do be willing to guarantee the numerical correctness in my experiments even its cost is low speed and high memory usage.

At present, float32.(load("path_to_img")) works fine for me. And currently I can definitely write algorithms like the following as a solution:

function foo_(imgs::Array{Gray{T},N}) where {T,N} # N be 2 or 3
    # everything here explicitly goes with T
    # so it would be easy to test if `Float32` type works well
end

function foo(imgs::Array{Gray{Normed{T,N},3}}) where {T,N}
    # do some warm-up test
    idx = 1:size(imgs,3)
    idx = idx[rand(idx)]
    if foo(imgs[idx]) ≈ Gray{Normed{T,N}}.(foo_(Gray{Float32}.(imgs[idx])))
        # things goes very pleasing
        foo_(imgs)
    else
        @warn "Fall back to Float32 version"
        imgs .|> Gray{Float32} |> foo_ .|> Gray{Normed{T,N}}
    end
end

I'm new with Julia and not familiar with engineering at present, I don't know how to implement your ideas in details, so I just explain what I want as a package user.

I do like the idea of introducing CheckedNormed as replacement of Float32 to make these types unified and more manageable. This can definitely make debugging more easier.

And I might also "want" a macro like this, which I think makes debugging even more easier (I don't even know if this is possible):

function foo()
    # function definitions
end

@safe foo() # for debug usage

@fast foo() # make it fast

These @safe and @fast might do things kind of like changing Normed and CheckedNormed types back and forth or just do some configuration. However, I guess this requires a huge amount of work and code changes, which might not be a good idea.

JuliaImages / ImageCore.jl

the wraparound behavior of using N0f8 seems unwanted? #63