Closed johnnychen94 closed 6 years ago
That is expected behaviour. N0f8
is basically a reinterpretation of a single byte to the range [0, 1]. It can't represent any numbers outside that.
If you want to have more flexibility use floating point number. They can also use numbers outside the color domain
julia> c = RGB{N0f8}(0.1, 0.2, 0.3)
RGB{N0f8}(0.102,0.2,0.298)
julia> c - c - c
RGB{N0f8}(0.902,0.804,0.706)
julia> c2 = RGB{Float32}(0.1, 0.2, 0.3)
RGB{Float32}(0.1f0,0.2f0,0.3f0)
julia> c2 - c2 - c2
RGB{Float32}(-0.1f0,-0.2f0,-0.3f0)
Concerning truncation. N0f8
is not an image-processing specific type. Its a fixed-point number defined in FixedPointNumbers
. I suspect there are good reasons for the wrap around behaviour that have probably to do with performance.
julia> N0f8(0.1) - N0f8(0.2)
0.906N0f8
Either way, the right place to propose/discuss a change to that behaviour is https://github.com/JuliaMath/FixedPointNumbers.jl
@johnnychen94 if you're used to Matlab then you probably expect saturating arithmetic. https://docs.julialang.org/en/stable/manual/faq/index.html#Why-does-Julia-use-native-machine-integer-arithmetic?-1
Wouldn't be crazy to define a FixedPoint overflow-checking type, however; it would be quite handy for debugging purposes.
@Evizero Many thanks.
So the solution for this is to write img = Gray{Float32}.(load("img_path"))
for all image loadings.
I create this issue here because the default loading behavior is kind of annoying, and I'm not sure if this is an image-processing specific problem or not.
I mean, I personally prefer using Float
type instead of FixedPointNumbers
to avoid unwanted bug caused by this wraparound behavior, and just don't care much about the accuracies brought N0f8
.
First, slightly simpler is float32.(load("img_path"))
.
Second, in the Matlab world many images are loaded in UInt8
format, and you get similar problems there (problems due to saturating arithmetic rather than overflowing arithmetic).
Third, rather than converting whole images you could use temporary variables that make arithmetic safe. ssd
and similar functions in Images.jl do this. (https://github.com/JuliaImages/Images.jl/blob/f6145afef096540aca323a128d5d4b3ffe894542/src/algorithms.jl#L276 and https://github.com/JuliaImages/Images.jl/blob/f6145afef096540aca323a128d5d4b3ffe894542/src/algorithms.jl#L291-L302)
Fourth and most importantly, we could change this, but the naive approach has a dreadful cost. Let me explain. The main reason to use N0f8
/UInt8
is memory---N0f8
is just 1 byte, whereas Float32
is 4. Now, if all you're doing is loading cameraman
then you don't care. However, if you're loading images (or image collections) via mmap
(e.g., https://github.com/JuliaIO/NRRD.jl/blob/7a4f97e83c89ccece2ecbea320fc3cb28a3d9449/src/NRRD.jl#L204-L209), then in fact you have no choice except to use the representation as it is stored on disk. mmap
is used by people who analyze data from scientific cameras (which produce ~4TB/hr these days), and perhaps (I honestly don't know) anyone who is training some algorithm on a large database of images of all the same size and type (e.g., some huge face database stored as a single file). Given this fact, there are three potential routes:
Float32
and make Julia really bad for processing large images or image collections (you'd need ~8TB of RAM for a 1hr recording with a scientific camera)Float32
but keep large images in N0f8
or N0f16
formatWe've gone with the last of these, which I view as the least-bad of all options. IMO, the middle one is the worst of them all. It has several problems, but the most serious is that we would evolve a class of bugs that only strike users of large images, and therefore are not easily discoverable by a test suite. This is the "dreadful cost" to which I was referring above.
Now, despite these considerations i wonder if we should harness your experience (and you are not alone) and see what we can do to improve the situation. I see two interesting ways forward:
CheckedNormed
type (CN0f8
and similar) that checks for overflow in all arithmetic operations. imgc = reinterpret(CN0f8, img)
would then give you an object that would quickly discover any problematic algorithms. We could do this and try running the entire Images test suite. (An alternative is to allow one to configure things so that our current Normed
types always check...that might be easier and more comprehensive.)float32.(img)
: imgc = mappedarray(float32, n0f8, img)
. This will convert all values to Float32
on the fly, and if you try to store values in imgc
it will convert back for its internal representation. If the value you're trying to store isn't representable as an N0f8
then it will throw an error (which is good). Or rather than using n0f8
you could truncate and then convert, in which case you get saturating arithmetic (but all the saturation happens at the end).Either of these will, of course, be slower than if you convert the image itself. So they might be best as a check to see whether the JuliaImages algorithms are sufficiently careful.
Is anyone interested in either of these? I can look at setting up the raw numerics of this if folks offer to help fix any JuliaImages bugs that this discovers :smile:.
Thanks for your thoroughly explanation on the design! It's basically the trade-off between engineering efficiency and experimental flexibility.
For my own purpose, I just want to make sure my idea and code work as expected without caring the potential overflow things, in which case I'm very likely to conclude that my idea doesn't work at all, which is a disaster for me. One need to do thing correctly at first and then consider doing it better, so I do be willing to guarantee the numerical correctness in my experiments even its cost is low speed and high memory usage.
At present, float32.(load("path_to_img"))
works fine for me. And currently I can definitely write algorithms like the following as a solution:
function foo_(imgs::Array{Gray{T},N}) where {T,N} # N be 2 or 3
# everything here explicitly goes with T
# so it would be easy to test if `Float32` type works well
end
function foo(imgs::Array{Gray{Normed{T,N},3}}) where {T,N}
# do some warm-up test
idx = 1:size(imgs,3)
idx = idx[rand(idx)]
if foo(imgs[idx]) ≈ Gray{Normed{T,N}}.(foo_(Gray{Float32}.(imgs[idx])))
# things goes very pleasing
foo_(imgs)
else
@warn "Fall back to Float32 version"
imgs .|> Gray{Float32} |> foo_ .|> Gray{Normed{T,N}}
end
end
I'm new with Julia and not familiar with engineering at present, I don't know how to implement your ideas in details, so I just explain what I want as a package user.
I do like the idea of introducing CheckedNormed
as replacement of Float32
to make these types unified and more manageable. This can definitely make debugging more easier.
And I might also "want" a macro like this, which I think makes debugging even more easier (I don't even know if this is possible):
function foo()
# function definitions
end
@safe foo() # for debug usage
@fast foo() # make it fast
These @safe
and @fast
might do things kind of like changing Normed
and CheckedNormed
types back and forth or just do some configuration. However, I guess this requires a huge amount of work and code changes, which might not be a good idea.
For example, we have a new denoising algorithm
imgdenoise_new(img)
and want to compare the performance. A subtraction is often used to get the difference.For example, I want to get the img noise, which should be very small...
Since
FixedPointNumbers
use the same wraparound behavior for overflow as Julia does, theimg_diff
andimg_noise
are not actually what we want; some minor differences will become very large. Indeed, truncation behavior seems reasonable for image processing tasks.