Beep6581 / RawTherapee

A powerful cross-platform raw photo processing program
https://rawtherapee.com
GNU General Public License v3.0
2.89k stars 323 forks source link

Randomly distributed colored hotspot pixels in defloat branch #619

Closed Beep6581 closed 9 years ago

Beep6581 commented 9 years ago

Originally reported on Google Code with ID 630

What steps will reproduce the problem?
Open a RAW file with standard settings

What is the expected output? What do you see instead?
Expected is a clean and neat picture, but I see some randomly distributed colored hotspot
pixels (see http://i.imgur.com/f2wqr.jpg).

The pixels are only visible in the preview and in the 100% zoom small preview. The
final processed image is fine.

RT version/revision:
Branch: defloat
Version: Dev-Defloat.29
Changeset: d198ec2e6923
Compiler: GCC 4.4.5
Processor: undefined
Bit depth: 32 bits
Gtkmm: V2.20.3
Build type: Debug
Build flags:   -g
Link flags:  
OpenMP support: ON
MMAP support: ON
Rawzor support: OFF

Operating system and architecture (e.g. winXP-32, ubuntu-10.10-64):
ubuntu-10.10-32. cat /proc/version says.
Linux version 2.6.35-28-generic (buildd@rothera) (gcc version 4.4.5 (Ubuntu/Linaro
4.4.4-14ubuntu5) ) #50-Ubuntu SMP Fri Mar 18 19:00:26 UTC 2011

Reported by burrima.ch on 2011-04-13 14:56:32

Beep6581 commented 9 years ago
imgur.com did convert the png image to jpg which made it look wrong. Therefore, I decided,
to attach the png.

Reported by burrima.ch on 2011-04-13 14:58:37

Beep6581 commented 9 years ago
Could you try to disable *all* kinds of sharpening in "Detail" tab and report the difference?

Reported by michaelezra000 on 2011-04-13 15:03:01

Beep6581 commented 9 years ago
Does not help. I played around a bit and did not find anything that helped. Finally,
I ended up with disabling all features, but still the same (see attachment). It's also
not related to the demosaicing algorithm.

The pixels change with each reloading of the preview image - same effect, but on other
pixels. So, I guess it has something to do with the loading or processing of the preview
image. As I said, the final image is fine.

The RAW is from a Nikon D300s but I observed the same issue with a Nikon P7000.

Please let me know if you need further input.

Reported by burrima.ch on 2011-04-13 16:01:08


Beep6581 commented 9 years ago
The raw file would help.

Reported by ejm.60657 on 2011-04-13 16:24:32

Beep6581 commented 9 years ago
RAW file [15MB]: http://burrima.ch/DSC_4908.NEF

Reported by burrima.ch on 2011-04-13 17:44:59

Beep6581 commented 9 years ago
I can't reproduce this on my setup using changeset 120c87ca9a71 with either fast demosaic
or AMaZE.

Reported by ejm.60657 on 2011-04-13 19:18:12

Beep6581 commented 9 years ago
I have those hotspots. They are mostly green and about 5% of them yellow.The spots are
visible in the preview window at any zoom setting, at the detail window and even worse
even at the saved tiff-16bit.
Seems that they are appearing at the dark places only. Looking at the navigator panel,
the rgb values on the green spots are 0,255,0. Strange that many neighbour pixels have
zeroed values for the g channel mostly but i see many zeroes for the other channels
also.
I can tell that panning in the preview the refresh is now faster than branch 3. Almost
instand.

RTdev-3.1m1.39 from head download. Not installed, just extracted the files as per the
readme file. Win vista32. 

http://movies.dpreview.com.s3.amazonaws.com/canon_g11/canong11_ISO1600.CR2.zip

On a second try i opened the g11-ISO100 file. There are no hotspots there.

Reported by iliasgiarimis on 2011-04-13 20:33:23

Beep6581 commented 9 years ago
With changeset 120c87ca9a71 I'm still observing the same issue. I uploaded my sources
to http://burrima.ch/defloat.tar.bz2 [75MB]. Maybe you can find out something with
it... 

Reported by burrima.ch on 2011-04-13 20:47:13

Beep6581 commented 9 years ago
Image open spotlessly on 3aaddda2e5b2 here.
However Jacques reported that he had the same issues, and they went away when he disabled
multithreading LCMS (slower, but obviously safer on same machines).
I'll add a "safe mode" option for LCMS tonight. I don't want to disable it completely
since it gives nices speedups, and maybe we find out why LCMS behaves weird on some
machines and some not (remembering Michaels horse image color problem when switching
to LCMS2)

Reported by oduis@hotmail.com on 2011-04-14 06:06:26

Beep6581 commented 9 years ago
I have lcms2 downloaded as described here: http://code.google.com/p/rawtherapee/issues/detail?id=610#c1
 - not sure what version it is (have no access to my pc right now)

Tonight, I will try to build the latest released lcms2.1 from source - just to be sure.

Reported by burrima.ch on 2011-04-14 06:52:41

Beep6581 commented 9 years ago
Thanks, that would be great to test.

Reported by oduis@hotmail.com on 2011-04-14 06:55:34

Beep6581 commented 9 years ago
You must be right, there is probably a buggy color management here with lcms2.

it doesn't work with custom icc profiles (adobe's icc, huelight, ...)

http://www.rawtherapee.com/forum/viewtopic.php?t=2789

Reported by iliasgiarimis on 2011-04-14 08:54:33

Beep6581 commented 9 years ago
I had the same problem with "Input profiles ICC". The solution lies in Rawimagesource.cc,
even with OMP. The "color" patch solves the problem.

Reported by jdesmis on 2011-04-14 13:00:12


Beep6581 commented 9 years ago
I will test this this evening!

Reported by michaelezra000 on 2011-04-14 13:36:24

Beep6581 commented 9 years ago
Fine, but don't check this in! I'll add a switch, as it works fine (and faster) with
the current code on other machines. Also if Jacques uses my build, he has no noisy
pixels, so it's probably not the root cause, but a side effect of something else I
guess.

Reported by oduis@hotmail.com on 2011-04-14 13:42:48

Beep6581 commented 9 years ago
Using the latest lcms2 (70a120d) still causes this issue. I removed the installed lcms
libraries first (apt-get purge liblcms2-dev) before "make install". Afterwards, I made
sure that all relevant (find -name *lcms*) files are up to date. Then I rebuilt RT
(hg pull; hg update; ./clean.sh;  make clean; cmake -DCMAKE_BUILD_TYPE=debug -DCMAKE_INSTALL_PREFIX=./build
-DBINDIR=. -DDATADIR=. -DLIBDIR=.; make install; ./build/rtstart;)

Changing the input profile does not change anything - neither with "no profile" nor
with "camera default". "Use embedded, if possible" is grayed out.

Reported by burrima.ch on 2011-04-14 17:26:26

Beep6581 commented 9 years ago
And also, the above color.patch does not solve the problem, unfortunately.

Reported by burrima.ch on 2011-04-14 17:33:18

Beep6581 commented 9 years ago
I just commited a "safe mode" option for disabling multithreading on LCMS completely.
Could you try it? Just start the new version once, then it create the new key [Color
Management]/LCMSSafeMode automatically. Just set it to "true" to enter the safe mode.

Reported by oduis@hotmail.com on 2011-04-14 17:58:55

Beep6581 commented 9 years ago
Tried the "safe mode" option but it didn't automatically create the new key. I exported
the "(custom)" image profile to a pp3 file and there was no such key added. Then I
manually added the key, loaded RT and imported the pp3 file - but no effect.

I made then sure that the sources got updated correctly by grep'ing for LCMSSafeMode
and it got found in options.cc and some other files. Also, RT shows changeset d80d94c8a4a4.

I tried twice - no effect.

Reported by burrima.ch on 2011-04-14 18:38:44

Beep6581 commented 9 years ago
Sorry, should have been more precise: The key is in the options file, not in the pp3
file.
(BTW: Maybe get the latest build, I made another fix meanwhile)

Reported by oduis@hotmail.com on 2011-04-14 18:49:01

Beep6581 commented 9 years ago
Ok, now I got it ;-) didn't know about the options file. I tried it but sadly I have
to disappoint you - it didn't fix the bug.

Now, I have this key in the options file automatically generated and set to true. I
used the latest build as you asked me to.

Another question: Could it be something with a floating point division not working
properly? Or maybe hardware issues with floating point engine? I use an iMac ALU 24".

Reported by burrima.ch on 2011-04-14 18:57:15

Beep6581 commented 9 years ago
Could be, but I think it's unlikely. At least we know now it's not a LCMS threading
issue (what some thought to be an issue).
Jaques sometimes has the same problems, but it works for him using my build, and he
doesn't run on Mac. My build is nothing special, just the default settings for RELEASE,
but using the latest GCC compiler tdm (4.5.2).

Reported by oduis@hotmail.com on 2011-04-14 19:10:02

Beep6581 commented 9 years ago
I don't know why but the release build works fine (used debug so far).

Reported by burrima.ch on 2011-04-14 20:28:04

Beep6581 commented 9 years ago
I also have this problem. It really looks like an OMP problem (a parallelized code read/writting
the same memory area), not necessarilly related to lcms2. If you switch off OpenMP
completly when running cmake, it should theorically go away (that's why emil don't
have this problem).

The question is "why Oduis don't have this problem, while using the same dependancy
libraries than me on the same system"... I downloaded Oduis's latest build (changeset
#d198ec2e6923 ), which afaik has OpenMP enabled, but is not affected by this problem,
and it effectively works fine here. What's very more surprising, it's also way faster
than my build of Defloat which has OMP enabled too (on the same image, same pp3 file)
:

Amaze demosaicing time with my build      : 7.42s
                            Oduis's build : 4.50s !

So either:
1. OpenMP is not enabled in my build
2. Release build (with -O3 flag) produce create much much more efficient code than
Debug (-O2)
3. ...?????

Reported by natureh.510 on 2011-04-15 00:36:02

Beep6581 commented 9 years ago
Especially since burrima says when building on RELEASE all is OK and using my new LCMSSafeMode
flag (which switches off OMP for LCMS) does not help, it looks more like a build problem
now.
Hombre, could you compare the DLLs from my build with your DLLs? Maybe a newer omp
dll or something?

Reported by oduis@hotmail.com on 2011-04-15 04:58:46

Beep6581 commented 9 years ago
I also noted the big speed grade with RELEASE build. Maybe some timing issues?

Searching for "DEBUG" inside the code didn't show any obvious problem. The only thing
I can note is that the following define should better be written with brackets (for
safety): http://www.google.com/codesearch/p?hl=en#ZjuywMbU9JA/winclude/deflate.h&q=tr_tally%20package:http://rawtherapee\.googlecode\.com&sa=N&cd=1&ct=rc&l=313

e.g. 
# define _tr_tally_lit(s, c, flush) { flush = _tr_tally(s, 0, c); }

I will double-check and try to find anything as soon as I'll find some time (hopefully
tomorrow).

Reported by burrima.ch on 2011-04-15 07:06:01

Beep6581 commented 9 years ago
I just made several attempts in "Release" mode:

If:
LCMSSafeMode = false; there are no colored  pixels, but the colors in "preview" and
in the editor (Photoshop) are bad

LCMSSafeMode = true, everything is perfect

Reported by jdesmis on 2011-04-15 08:55:27

Beep6581 commented 9 years ago
With Oduis's debug built on the g11 raw sample (comment 7).
Win vista32. 

LCMSSafeMode = true, With "camera default" there are spots. 
There are no spots and colors are perfect only if i use custom icc (Adobe). But there
is another problem. Noise reduction ... do almost nothing even with the sliders at
max, they are almost inactive. 

Reported by iliasgiarimis on 2011-04-15 11:22:46

Beep6581 commented 9 years ago
I am aware of the NR problem; it may have to do with some NR experiments Michael and
I were running in float just before defloat was spawned, which then propagated forward.

Reported by ejm.60657 on 2011-04-15 11:53:53

Beep6581 commented 9 years ago
I can confirm that the release build configuration fixes the issue, the debug one has
it.
32-bit Ubuntu 10.10
> hgid
Latest tag: Dev-Defloat, Latest tag distance: 33, Changeset: 1152:28df606c1f13
> uname -a
Linux eagle 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 19:00:26 UTC 2011 i686 GNU/Linux
> cat /proc/cpuinfo
[...]
vendor_id : GenuineIntel
cpu family: 6
model     : 15
model name: Intel(R) Core(TM)2 Duo CPU     E6550  @ 2.33GHz
stepping  : 11
[...]

Reported by kovacs.it on 2011-04-15 19:33:43

Beep6581 commented 9 years ago
Ok, I took some time to play around a bit. The following is a list of different things
I tried out:

Release build:
- cmake -DCMAKE_BUILD_TYPE=release -DOPTION_OMP=ON 
    -DCMAKE_INSTALL_PREFIX=./build -DBINDIR=. -DDATADIR=. -DLIBDIR=.
- Running very fast, 
- no false pixels
- LCMSSafeMode has no influence

Debug build:
- cmake -DCMAKE_BUILD_TYPE=debug -DOPTION_OMP=ON 
    -DCMAKE_INSTALL_PREFIX=./build -DBINDIR=. -DDATADIR=. -DLIBDIR=.
- Running more slowly,
- having false pixels, 
- having issues with file access,
- crashing sometimes (e.g. when opening the small 100% preview window)
- LCMSSafeMode has no influence

Relwithdebinfo build: 
- cmake -DCMAKE_BUILD_TYPE=relwithdebinfo -DOPTION_OMP=ON 
    -DCMAKE_INSTALL_PREFIX=./build -DBINDIR=. -DDATADIR=. -DLIBDIR=.
- Running very fast, 
- no false pixels
- LCMSSafeMode has no influence

Debug build without OpenMP:
- cmake -DCMAKE_BUILD_TYPE=debug -DOPTION_OMP=OFF 
    -DCMAKE_INSTALL_PREFIX=./build -DBINDIR=. -DDATADIR=. -DLIBDIR=.
- Running very slow,
- no false pixels  <== !!
- LCMSSafeMode has no influence

From CMakeCache.txt:
CMAKE_CXX_FLAGS:STRING=
CMAKE_CXX_FLAGS_DEBUG:STRING=-g
CMAKE_CXX_FLAGS_RELEASE:STRING=-O3 -DNDEBUG
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g

It seems to be a timing issue (maybe a race condition) that does not show up when optimization
is enabled (-O2 or -O3) or when multiprocessing is disabled.

LCMSSafeMode has no effect at all (at least regarting the false pixel issue).

The release build worked best - as expected ;)

Hope this investigation helps a bit.

Reported by burrima.ch on 2011-04-16 12:02:17

Beep6581 commented 9 years ago
I forgot to add the build version (from settings->about):
Branch: defloat
Version: Dev-Defloat.33
Changeset: 28df606c1f13
Compiler: GCC 4.4.5
Processor: undefined
Bit depth: 32 bits
Gtkmm: V2.20.3
Build type: relwithdebinfo
Build flags:   -O2 -g
Link flags:  
OpenMP support: ON
MMAP support: ON
Rawzor support: OFF

BTW: what is Rawzor support?

Reported by burrima.ch on 2011-04-16 12:04:22

Beep6581 commented 9 years ago
The dll are the same (but i use Gcc4.5.1 with libgomp 4.5.0, as the one in 4.5.1 seems
to be broken, anyway i didn't experienced problem so far with this trick), except for
lcms2 and libtiff. Did you build lcm2 yourself ? If so, can you send me lcms2.h so
i can compare it, and the command line you used to build it.

Thanks for this extensive test burrima, it clearly show me that it's a parallelisation
problem. I also confirm that i have this problem with the Debug but not the Release
target. IIRC, -O3 enable automatic loop parallelisation, so it may change things in
the ones we have manually set. But which one ???

Oduis, do you have this problem in Debug mode ?

Reported by natureh.510 on 2011-04-16 14:22:24

Beep6581 commented 9 years ago
I did not build LCMS2 myself, I think it's from your package.
I usually run RT.EXE in RELWITHDEBINFO, but will try it with just DEBUG (if it builds
again ;-)

Reported by oduis@hotmail.com on 2011-04-16 14:53:42

Beep6581 commented 9 years ago
"Large files such as photos and screenshots should be referred to with
links, please do not use this site's "Attach a file" feature for anything
over a few kilobytes. Use http://imgur.com/ for images and http://min.us/
for other files."

I deleted your attachment and re-uploaded it here:
http://i.imgur.com/w7La7.jpg

Devs: please use labels!

Reported by entertheyoni on 2011-04-17 19:11:42

Beep6581 commented 9 years ago
I hope that you can build RT again now, so how does it play with the Debug target ?

Reported by natureh.510 on 2011-04-17 21:52:09

Beep6581 commented 9 years ago
DEBUG shows the same problems mentioned here (colored pixels), don't know why.
(Good night)
However, clean on the other build types.

Reported by oduis@hotmail.com on 2011-04-17 22:01:58

Beep6581 commented 9 years ago
Should be fixed now, but there are still strange pixels, especially on some image. Look
at the 1:1 scale with this image, on the top left (neutral profil + No monitor profil).

http://natureh.free.fr/RawTherapee/IMG_0544.CR2

Reported by natureh.510 on 2011-04-20 17:09:43

Beep6581 commented 9 years ago

Reported by natureh.510 on 2011-04-20 17:09:59

Beep6581 commented 9 years ago
Thank you for issuing that problem! 

The problem is fixed with my NEF images. But with your CR2 image I still see some small
false green pixels in the dark area left - maybe less than before. And also the big
mysterious pixels. Funnily they change each time I reload the image. And they also
appear in release mode. See http://imgur.com/gNtoi (release mode)

PS: I think you introduced a variable 'float g' which is not used?

Reported by burrima.ch on 2011-04-20 19:30:19

Beep6581 commented 9 years ago
That was precisely the bug ! ;) In fact Lab2xyz is a macro that use a g variable, so
it has to be defined otherwise you'll have the problem we experienced. Look at the
other functions, they create a g variable where Lab2xyz is used (inside the parallelized
loops, of course).

Reported by natureh.510 on 2011-04-20 22:10:32

Beep6581 commented 9 years ago
I see. Is there a reason to why a macro is used instead an inline method?

Reported by burrima.ch on 2011-04-21 07:58:46

Beep6581 commented 9 years ago
Is there a processing overhead when a parameter is declared 14 milion times, or not
?

Reported by natureh.510 on 2011-04-21 08:09:16

Beep6581 commented 9 years ago
Probably not because the compiler will use processor internal register for frequently
accessed variables anyway. To be sure, you could declare g with the register keyword
like this: 'register float g'. Please correct me if I'm wrong.

Reported by burrima.ch on 2011-04-21 08:24:05

Beep6581 commented 9 years ago
Inline function would probably do just as well, if it makes the code more readable please
change it.

Reported by ejm.60657 on 2011-04-21 12:31:18

Beep6581 commented 9 years ago
#define epsilon 0.00885645 //216/24389
#define Lab2xyz(f) (( (g=f*f*f) > epsilon) ? g : (116*f-16)*kappainv)

can epsilon "recalculated"?
and change code like
#define Lab2xyz(f) (( ( f > new_epsilon) ? f*f*f : (116*f-16)*kappainv)

new_epsilon is maybe 6/29 0.206896552. This need someone really check.

Reported by GreatBull69 on 2011-04-23 13:42:44

Beep6581 commented 9 years ago
#46: This looks syntactically fine. Precision for the comparison (f>new_epsilon) shouldn't
be an issue. I will test this code change for you tonight.

BTW: Is there a reason why constants are defined in floating point notation (such as
0.00885645) instead of ratios (e.g. (216/24389))? - The preprocessor will do the computation
anyway and with higher precision than 6 digits, so no speed gain.

Reported by burrima.ch on 2011-04-26 12:11:23

Beep6581 commented 9 years ago
6/29 is incorrect; 216/24389 is correct.  If it makes you feel better to have it calculated
every time, it's fine with me; nevertheless, 6 significant figures is already more
than sufficient.

Reported by ejm.60657 on 2011-04-26 12:30:10

Beep6581 commented 9 years ago
Why do you think 6/29 is incorrect? 
(a/b)^(1/3) = (a^(1/3) / b^(1/3)). 
f = g^(1/3).
g^(1/3) > epsilon^(1/3) ==> f > new_epsilon
So: new_epsilon = epsilon^(1/3) = (216/24389)^(1/3) = 216^(1/3) / 24389^(1/3) = 6/29.

Reported by burrima.ch on 2011-04-26 15:18:05

Beep6581 commented 9 years ago
My mistake, I misread what you wrote.  The suggested change indeed bypasses the need
to (remotely) define temporary storage, which is a big plus.

Reported by ejm.60657 on 2011-04-26 15:35:13