Closed Beep6581 closed 9 years ago
I'll have a look
Reported by heckflosse@i-weyrich.de
on 2014-03-06 10:56:21
Started
Here's the first patch. Intention of this patch is not to speed up processing, but to
change some internal calculations from int to float. Of course this leads to differences
between before and after this patch. Also got a speedup from 11528 ms to 10010 ms for
a D800 NEF by this changes caused by reduced number of float <=> int conversions. Will
make further changes, when differences of this patch are accepted by the team.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-06 21:59:26
PatchSubmitted
A tip when attaching patches, if you have a .txt file suffix instead of .patch you can
click on it directly in any browser to read it. With .patch most browsers just let
you download the file. So issue2278_00-patch.txt would be a more convenient filename
:)
Nothing in the VNG4 algorithm itself is locked to integer math, so making it floating
point seems like a good idea, I'm okay with the patch.
Reported by torger@ludd.ltu.se
on 2014-03-07 08:41:26
There are at least two downsides of .txt suffix for patch files.
1.) I have to choose 'all files (*.*)' when importing a patch using TortoiseHg
2.) I lose syntax highlighting in editor when I open a patch file with suffix .txt
Anyway, here's the next patch. I changed some other ints to floats. Now vng4 is fully
floating point. Also got an additional speedup. Processing time is now 9350 ms.
Parallelized version follows later today. Have to clean the code first. Processing
time of parallelized vng4 at my system is about 2000 ms.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-07 13:20:12
i missed this entry, so i won't drop my 'improvements' in this issue.
VNG4 is very useful when you want to work with high ISO noisy images. They cleanup
much better with denoise etc.
Reported by janrinze
on 2014-03-07 13:56:13
Great! No problem with .txt vs .patch, use .patch if it's better for your workflow,
I can download :-)
Reported by torger@ludd.ltu.se
on 2014-03-07 17:23:04
Sorry if I am off topic .. :)
Talking about VNG4 .. I recall some discussions about the first stage bilinear interpolation
it uses and that it would become a better detail resoving demosaicer if a better interpolation
was used there.
If my memory serves .. it was Manuel Llorens (from PerfectRaw .. with Emil-Jacques
and others) who had in his TODO list a related improvement. Unfortunately his site
(www.rawness.es) is dead last year(s) ..
His last presence I know is https://sites.google.com/site/manuelllorens/ and gives
a mail address there .. manuelllorens@gmail.com
Reported by iliasgiarimis
on 2014-03-07 19:24:31
Re #5, Jan: I would be glad if you could review my next patch :-)
Re #7, Ilias: As you know more details about the past than me, could you try to contact
him?
I frittered a bit with further optimizations instead of cleaning the code. So I need
at least one additional day before I can post the next patch. Actually the processing
time is below 1800 ms at my system :-)
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-07 23:07:45
Reported by heckflosse@i-weyrich.de
on 2014-03-07 23:17:25
Ingo and Torger, thanks for doing this! Vng4 is indeed a nice demosaic algorithm, and
the speedup is most welcome.
Reported by johan@birkagatan.com
on 2014-03-08 12:49:07
Decided to post a patch with ugly code, so you can check for speed and differences in
output. D800 NEF now needs 1650 ms at my system. Detected no differences in output
compared to issue2278_01_patch.txt. Will clean the code before next patch.
Memory footprint of VNG4 changed a bit.
Before patch it was
x + 4 * 3 * 4 * width bytes
Now it is
x + c * 3 * 4 * width bytes
where x is constant and c is number of parallel threads.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-08 19:07:07
Ingo, your latest patch does not apply clean on latest default branch.
Apparently you did a diff against an older version than what you submitted into default
branch.
Reported by janrinze
on 2014-03-08 20:03:01
When applied to the right patch level then it works quite well.
It seems pretty quick. No idea if the demosaicing is better/worse than before.
Any chance that bilateral filtering could improve the sharpness?
Reported by janrinze
on 2014-03-08 20:37:58
Sorry for the trouble. Patch from #11 should work when applied to revision 4b8f3cb02ef0.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-08 21:30:07
Here's the next one. Code is much simpler now and it needs c * 3 * 4 * width bytes less
memory than last one. Speed is the same at my system.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-09 13:19:34
Great work as always Ingo! Below is average of 10 D700 NEF-files (patch 3 tested):
Before: 3 622 ms
After: 480 ms
Impressive!
Reported by johan@birkagatan.com
on 2014-03-10 18:57:40
Johan, thanks for your test. Do you have a new machine? It's faster than mine with D700
NEF. Will try to post a new patch with simpler code and a bit (only a few %) speedup
tomorrow (that will be the patch I want to commit). Last days weather was too good
to spend a lot of time at the keyboard ;-)
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-10 23:27:57
Here's the one I want to commit. It's a bit faster than issue2278_03.patch. Now it needs
1580 ms for a D800 NEF at my system.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-11 19:19:24
Hello Ingo, re #17: No, it's the same Intel 2600k I've had a while now. What's your
timings - out of interest?
Reported by johan@birkagatan.com
on 2014-03-11 19:45:42
Hi Johan, I forgot that you own an i7. Here it takes about 600 ms for a D700 NEF.
Reported by heckflosse@i-weyrich.de
on 2014-03-11 19:52:00
i7 CPU Q 820 @ 1.73GHz, Pentax K10D raws:
~4620ms without the patch, ~750ms with :)
Reported by entertheyoni
on 2014-03-11 21:11:10
DrSlony, thanks for the test.
Reported by heckflosse@i-weyrich.de
on 2014-03-11 21:38:21
Reported by heckflosse@i-weyrich.de
on 2014-03-12 21:24:33
Ingo, D800e on 12 threads @3.2 GH
Before 12500ms, after 925 ms:)!!!
Reported by michaelezra000
on 2014-03-14 03:32:54
Michael, thanks for the test.
Committed to revision 36699351cd8c
Reported by heckflosse@i-weyrich.de
on 2014-03-14 12:23:09
FixedPendingConfirmation
Great work!
Concerning detail resolving, if we change the bilinear filtering algorithm to improve
green detail I don't think we should do it at the cost of robustness, and I suspect
(but don't know) that extracting more detail in the greens would indeed reduce robustness.
One of the key reasons I'm using VNG4 from time to time is that it renders good results
for files with green channel instability, which I get with some technical wide angle
lenses.
Reported by torger@ludd.ltu.se
on 2014-03-15 09:43:08
Built and tested the committed revision, seems to work fine with great speed improvement.
Others have tested too as seen in the log, setting to fixed.
Reported by torger@ludd.ltu.se
on 2014-03-15 09:46:32
Fixed
Oh, just saw an issue, the signed short int *cp pointer should not be static in this
multithreaded program, right? Seems to be some inheritance from dcraw code.
Reported by torger@ludd.ltu.se
on 2014-03-15 10:35:58
Anders, you're right. Missed that thing. Will correct it.
Reported by heckflosse@i-weyrich.de
on 2014-03-15 10:55:57
Removed that static declaration completely because it doesn't make sense to hold this
few values in memory.
Reported by heckflosse@i-weyrich.de
on 2014-03-15 12:12:01
I also noted that the first bilinear step skips one pixel around the edge (so I suppose
the end result will have one pixel line around which is not demosaiced), but I guess
we don't care as we crop away a few pixels around the border anyway.
Reported by torger@ludd.ltu.se
on 2014-03-15 15:42:35
Exactly.
Reported by heckflosse@i-weyrich.de
on 2014-03-15 15:55:04
Would it be better to do
pix[ip[0]] = sum[ip[0]] / csum[row & 15][col & 15][i];
as
pix[ip[0]] = sum[ip[0]] * csum[row & 15][col & 15][i];
in the bilinear filtering step (ie use multiplication instead of division)? As the
constant table csum is nowadays float so it can be set to
csum[row][col][colcount] = 1.0 / sum[c];
instead of the old
csum[row][col][colcount] = sum[c];
Reported by torger@ludd.ltu.se
on 2014-03-27 11:58:43
This allocation is also a bit suspicious, a dcraw inheritance I suspect.
int * ip = (int *) calloc ((prow+1)*(pcol+1), 1280);
It assumes some size of int, hopefully 8 bytes, but I'm not so sure. In other code
(I have not verified it in raw therapee) when I've used dcraw-inspired vng4 code I've
had memory overruns on this buffer. From the loop under it's hard to figure out exactly
how big the buffer needs to be. In that case I was lazy and made it a lot bigger (and
related it to sizeof(int)).
Reported by torger@ludd.ltu.se
on 2014-03-27 12:06:19
Re #33: I tried your suggestion. On my (memory-bandwith-limited) system, I get a speedup
of 14 ms by changing this calulation using a D800 NEF. That's less than 1%.
Re #34: Yes, it's from dcraw. Will have a look.
Reported by heckflosse@i-weyrich.de
on 2014-03-27 12:37:55
Re #34, Anders: I had a look at the allocation. It's really hard to figure out how big
the buffer needs to be. But I ran it through valgrind for soma files and got no invalid
reads or writes.
Ingo
Reported by heckflosse@i-weyrich.de
on 2014-03-27 22:20:11
s/soma/some/ :-)
Reported by heckflosse@i-weyrich.de
on 2014-03-27 22:23:02
Ok... sorry for having made you spend time on it, I was genuinely worried though...
Reported by torger@ludd.ltu.se
on 2014-03-28 07:42:51
Originally reported on Google Code with ID 2278
Reported by
torger@ludd.ltu.se
on 2014-03-06 08:48:16