Beep6581 / RawTherapee

A powerful cross-platform raw photo processing program
https://rawtherapee.com
GNU General Public License v3.0
2.75k stars 313 forks source link

Increase speed of noise reduction preview when Auto Chroma is selected, and fix some bugs in noise reduction #2540

Closed Beep6581 closed 8 years ago

Beep6581 commented 9 years ago

Originally reported on Google Code with ID 2557

Noise reduction preview rarely uses all available cores because the tile size is very
large. For example on my screen in full screen preview it uses only 3 of 8 cores. I
don't want to change the tile size for several reasons but made some first tries using
nested parallel regions. This way we can increase the number of cores used in NR preview
and reduce processing time. I'll post a first patch when Issue 2495 is committed.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-11-05 14:07:50

Beep6581 commented 9 years ago
Here's the first patch which uses nested parallelism to get a bit more speed in nr preview.
It's still wip and slow, but already reduced the processing time for nr preview quality
standard at my system from about 1500 to about 1300 ms. Speedup depends on preview
size and number of cores. Bigger preview size => less speedup, more cores => more speedup.
I tested with default ISO high. In full processing there should be no difference in
processing time.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-11-16 16:58:42


Beep6581 commented 9 years ago
Next one. Processing time reduced to 1200 ms (was 1300 ms with last patch)

Reported by heckflosse@i-weyrich.de on 2014-11-16 19:37:55


Beep6581 commented 9 years ago
Next wip patch follows tomorrow. Processing time is around 800 ms now

Reported by heckflosse@i-weyrich.de on 2014-11-29 21:36:35

Beep6581 commented 9 years ago
Here's the next patch. Processing time for nr preview (at my system and screen) is between
700 and 800 ms now (was about 1300 ms with last üatch) and processing time for auto
chroma mode 'preview' is now between 220 and 300 ms (was about 800 ms with last patch).
The patch is still wip. Actually it works only on SSE-machines (I'll fix that before
commit). Processing time for full nr processing (whole file) also is reduced a bit,
but not much (less than 10%).

It also fixes a bug in nr 'high quality' mode, so in this mode the output can have
differences.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-02 01:17:38


Beep6581 commented 9 years ago
Hello Ingo

I tested patch 09...
All seems to works fine :)

Gain on processing time, from about 30% to 50%

Very good job !

Reported by jdesmis on 2014-12-02 11:49:08

Beep6581 commented 9 years ago
Hi Jacques, thanks for testing. issue2557_09.patch was a mockup and didn't compile or
work without SSE and OPENMP. I already prepared a new patch which compiles and works
fine even without SSE and OPENMP and also is a tiny bit faster than last patch. I'll
post it the next days.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-03 00:12:27

Beep6581 commented 9 years ago
Here's the next one. It's very close to the one I would like to commit (will do some
further code cleaning etc.). Processing time for full processing in standard quality
went down by about 20% (in high quality about 10 to 15%) compared to tip. In preview
mode I'm now at less than 600 ms (was between 700 and 800 ms with last patch). There
can be differences in output because I fixed a bug when using nr chroma curve.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-10 21:00:16


Beep6581 commented 9 years ago
I just tested your patch !

Very good job !

Very small differences between images...probably better after !

Important gain in processing time : 50% on preview, 20% on output :)

I see you rewrite "wavelet" ! nice thing. Thank you.

Reported by jdesmis on 2014-12-12 16:37:04

Beep6581 commented 9 years ago
Hi Jacques, thanks for testing :-) About the rewrite of "wavelet": Old version had one
function for both directions, which led to very bad (slow) memory access pattern when
used in vertical direction. For this reason I made vertical and horizontal functions.
Additionally I could save some temporary buffer copies. I also removed the unused functions
in "wavelet". If we need them in future we can get them from an older repository.

Here's the next patch. Almost the same as last one, but adds the stuff mentioned in
Issue 2495 #151 (never save auto chroma preview mode to pp3, but save the values instead)

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-12 18:34:04


Beep6581 commented 9 years ago
Hello Ingo

All seems to work fine :)

Ready to commit !

good job.

Reported by jdesmis on 2014-12-14 10:01:31

Beep6581 commented 9 years ago
Committed to revision b04ac7a8978b. I let the Issue open for further improvements.

Reported by heckflosse@i-weyrich.de on 2014-12-14 18:09:06

Beep6581 commented 9 years ago
A big thank you Ingo for all your work here! Great!

Reported by johan@birkagatan.com on 2014-12-15 07:25:52

Beep6581 commented 9 years ago
Here's the next one. Processing time in Standard quality is reduced by about 10% compared
to tip. Most of the speedup was achieved by optimizing wavelet decomposition. I'll
have a look at wavelet reconstruction later.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-16 19:28:49

Beep6581 commented 9 years ago
Patch from #64 had a bug. Corrected with this patch.

Reported by heckflosse@i-weyrich.de on 2014-12-16 22:12:53


Beep6581 commented 9 years ago
Ingo

I tried your last patch (13)

Small differences on speed-up :
* 10% for info denoise
* 5% for Denoise (TIF)

I am on windows8, with I7 - 8 core 

:)

Reported by jdesmis on 2014-12-17 06:15:44

Beep6581 commented 9 years ago
This one fixes a bug when using nr chroma curve.

L subsampling patch follows afterwards.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-18 13:08:37


Beep6581 commented 9 years ago
Same as last one, but with L subsampling. Denoise for D800 is down to 8 seconds now
:-)

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-18 13:21:51


Beep6581 commented 9 years ago
8 seconds?!!! Wow. What is this magic subsampling?

Reported by michaelezra000 on 2014-12-18 14:11:38

Beep6581 commented 9 years ago
Michael. Subsampling by 2 uses only half of data points, which in fact means quarter
of data points because it's done in each direction. This method was used in NR for
a and b channel all the time. I tried to use it for L channel to see whether the result
is satisfying. If you compare before/after patch, please don't compare patch 15 to
patch 13. Compare 13 to 14 and 14 to 15 please.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-18 14:57:55

Beep6581 commented 9 years ago
Ingo, comparing patches 14 and 15, I don't see significant differences.
I get 5 seconds for denoise with Luminance curve and Chrominance curve NR with i7/12
threads.

About subsampling and half data points - half of what? Is it of the full image resolution
or something else?

Reported by michaelezra000 on 2014-12-18 17:20:59

Beep6581 commented 9 years ago
Michael: http://gpeyre.github.io/numerical-tours/matlab/wavelet_4_daubechies2d/ ;-)

Reported by heckflosse@i-weyrich.de on 2014-12-18 17:40:43

Beep6581 commented 9 years ago
Almost the same as last one, but reduced memory requirement and (hopefully) fixed the
bug from Issue 2594 #39. Some few % faster than last one but nothing to write home
about.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-19 16:58:51


Beep6581 commented 9 years ago
Here's some more detailed info about the peak memory usage of patch 16 compared to patch
14 (subsampling vs. no subsampling):

For tilesize T, number of cores C and number of wavelet levels L the peak memory usage
of patch 16 vs patch 14 is reduced by (7+3*4*L)*C*T*T bytes. Given a usual tilesize
T=948, number of cores of my system C=8 and number of levels L=5 (which is the minimum
number of wavelet levels used in NR) peak memory usage of patch 16 is reduced by (7+3*4*5)*8*948*948
= 481705344 bytes = 460 MB compared to patch 14.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-19 23:36:45

Beep6581 commented 9 years ago
I tried the current development version (rev de35a7cccc89) and also patch 16 today and
I have found some weird bug with noise reduction with L*a*b* Method. The problem does
not occur with 4.2, so I guess it was caused by the patches for this bug.

I have an image where in an area the blue channel is (almost) saturated for most pixels
(R<10%, G~40%). As soon as I apply any noise reduction with L*a*b* method (RGB does
not do it) the whole area turns into a bright green (R~0%, G~60%, B~30%). The exact
color changes when I change the working profile in Color Management settings (for all
other profiles except ProPhoto it is more a bright orange), but it is not blue.
I tried to export the relevant part of the image as tiff and then apply noise reduction
on that file, but here the problem does not occur. I unfortunately can not send you
the complete raw image to reproduce, but I hope you can figure out from the description
what is going on.

Reported by lukas.middendorf on 2014-12-20 15:30:28

Beep6581 commented 9 years ago
If I understand correctly, the problem is already with rev. de35a7cccc89 ?
Then at least it is not caused by latest (uncommitted) patches. Can you try bisecting
to find the revision which introduced the problem? http://mercurial.selenic.com/wiki/BisectExtension

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-20 16:50:15

Beep6581 commented 9 years ago
Ingo

I tried patch 16, and I extract the 3 files "cplx_wavelet" + "boxblur"...and copy to
one session where I have "wavelet levels" (issue2594) -( I have 2 sessions 2594, to
test differents algorithms / solutions / ...)

When I open a file, with 9 levels, Bug #39 issue2594 is always present as mentionned
by DrSlony + crash ...

I tried to reproduce #75 with many images, with saturated blue...in Lab mode, and change
Prophoto <==> sRGB ...all work fine

:)

Reported by jdesmis on 2014-12-20 17:04:38

Beep6581 commented 9 years ago
Jacques, I had another look. There are at least two more places, which are not clean.
I'll fix them with next patch.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-20 17:37:38

Beep6581 commented 9 years ago
Did some bisecting and it seems the bug was introduced in b04ac7a8978b

Reported by lukas.middendorf on 2014-12-20 19:45:09

Beep6581 commented 9 years ago
lukas.middendorf, thank you very much :-)

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-20 19:50:43

Beep6581 commented 9 years ago
Lukas, can you try this change?

rtengine/color.h line 903

    return (x = exp(log(x)/gamma));

Reported by heckflosse@i-weyrich.de on 2014-12-20 20:02:45

Beep6581 commented 9 years ago
Ingo,
that change does not help.
I also tried if compiling for a generic x86_64 changes anything, but no change compared
to march=core-avx-i (Ivy Bridge).

Lukas

Reported by lukas.middendorf on 2014-12-20 20:33:29

Beep6581 commented 9 years ago
Ok, thanks for testing. A problematic file would really help

Reported by heckflosse@i-weyrich.de on 2014-12-20 20:43:59

Beep6581 commented 9 years ago
Though you can't post your file, can you please give us as much informations as possible
(which camera, which OS etc.)

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-20 20:45:52

Beep6581 commented 9 years ago
The image (RAW) was taken with a Canon 70D on a party where it was relatively dark with
some colored light. 
The problematic part of the image is the reflection of some blue light on a white plate.
The image was taken with flash at ISO400 and is (except for the part with the reflections)
about 1.5EV underexposed.

Here is a crop of the relevant region with and without noise reduction.
http://i.imgur.com/gYXyWkY.jpg
http://i.imgur.com/8VWQUPm.jpg

Reported by lukas.middendorf on 2014-12-20 21:54:52

Beep6581 commented 9 years ago
OK, I now managed to take a second photo of a similar situation (reflection of a blue
LED on the glossy box of my EOS 70D) which also shows the same bug.
How should I send you the 20MiB RAW file? 

Reported by lukas.middendorf on 2014-12-20 22:01:43

Beep6581 commented 9 years ago
Upload it there : www.filebin.net and post the link

Reported by heckflosse@i-weyrich.de on 2014-12-20 22:10:10

Beep6581 commented 9 years ago
looks like Bitburger Stubbi ;-)

Reported by heckflosse@i-weyrich.de on 2014-12-20 22:12:05

Beep6581 commented 9 years ago
filebin seems to have technical problems. When I want to upload a file it only gives
me "502 Bad Gateway".

Reported by lukas.middendorf on 2014-12-20 22:15:11

Beep6581 commented 9 years ago
I get the same error, perhaps try again later...

Reported by heckflosse@i-weyrich.de on 2014-12-20 22:20:04

Beep6581 commented 9 years ago
filebin has not come back yet, but I tried some other file sharing service:
http://www.FastShare.org/download/IMG_3379.CR2

Reported by lukas.middendorf on 2014-12-20 23:34:04

Beep6581 commented 9 years ago
Thanks a lot. Issue confirmed. I'll have a look

Reported by heckflosse@i-weyrich.de on 2014-12-21 00:05:12

Beep6581 commented 9 years ago
Issue 2614 has been merged into this issue.

Reported by heckflosse@i-weyrich.de on 2014-12-21 12:44:18

Beep6581 commented 9 years ago
Bug found (copy&paste bug). Will be fixed with next patch.

Reported by heckflosse@i-weyrich.de on 2014-12-21 12:46:17

Beep6581 commented 9 years ago
I added some SSE code for wavelets and fixed the bug from #75.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-21 13:30:23


Beep6581 commented 9 years ago
I committed a bugfix for #75. Patch 17 does not apply to new tip, I guess. I'll make
a new one

Reported by heckflosse@i-weyrich.de on 2014-12-21 14:55:33

Beep6581 commented 9 years ago
Hi Ingo, I just tried patch 17 and see that it introduces pixelated pattern with NR
enabled. For reference, patch 15 did not have this issue.

Reported by michaelezra000 on 2014-12-21 15:37:43

Beep6581 commented 9 years ago
Michael, thanks for reporting this :-) I'm already looking. It's something introduced
with patch 16.

Reported by heckflosse@i-weyrich.de on 2014-12-21 16:59:45

Beep6581 commented 9 years ago
Here's a compromise between 17 and 15. It's faster than 15, has the peak memory usage
mentioned in #74, but is not as fast as patch 17. Artifacts are gone now. I'll make
further optimizations when this one is tested and committed.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-21 19:07:45


Beep6581 commented 9 years ago
Same as last one, but reduced peak memory usage of NR by width*height*9 byte

Reported by heckflosse@i-weyrich.de on 2014-12-22 11:43:39


Beep6581 commented 9 years ago
Committed to revision 1814e8e44db3
Issue stays open for further improvements.

Ingo

Reported by heckflosse@i-weyrich.de on 2014-12-23 11:22:51