darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.76k stars 1.14k forks source link

Verify CPU and GPU results #3421

Closed groutr closed 3 years ago

groutr commented 4 years ago

Is your feature request related to a problem? Please describe. Do we make the assumption that an image processed by the CPU should yield the same result as the image processed by the GPU?

Describe the solution you'd like Validate the outputs of CPU and GPU operations. Create an automated testing program that will run each iop on the CPU and GPU and compare the results for "significant" differences. I think this can be accomplished relatively easily using the darktable-cli interface.

I'm willing to implement a proof a concept if the maintainers of this project see value in this sort of testing.

parafin commented 4 years ago

Keep in mind that outputs won't be identical in many cases. It makes more sense to check similarity in some (or rather many) way.

groutr commented 4 years ago

Here is an example where processing with GPU vs CPU leads to a significant (IMO) difference. This is the lowpass module with 35.15 radius and bilateral filter (rest of settings default) The only difference is toggling OpenCL.

CPU: 20190902-151043-cpu.png

GPU: 20190902-151043-gpu.png

It almost looks like the GPU runs the bilateral lowpass filter twice.

pitbuster commented 4 years ago

That would help a lot. We could even have a regression suite that avoids developers breaking modules.

pitbuster commented 4 years ago

Keep in mind that outputs won't be identical in many cases. It makes more sense to check similarity in some (or rather many) way.

I think something like computing the L2 and/or L_infinity norm of the difference should do

groutr commented 4 years ago

@hellbuster Thanks for the suggestion. I'll see if that works.

Is there a way to set iop parameters via darktable-cli?

upegelow commented 4 years ago

I am not in favor of implementing a CPU/GPU comparison directly in darktable. This should be handled by a script if needed.

But I what I can confirm is the difference between CPU/GPU in this case. Seems to affect the bilateral filter - probably not only in lowpass. Does not happen in 2.6.3, so it's a regression which needs to be fixed.

groutr commented 4 years ago

@upegelow I agree. My intention was that it would be a standalone script that simply invoked darktable-cli with generated xmp files and then compared the outputs, printing some sort of score to the console. Outputs above a certain threshold would be saved for later inspection along with the generated xmp file. My prototype code is in Python. I was hoping to find a way to vary the iop parameters via darktable-cli or set them directly in the xmp file. That way, I could sweep thru a range of states of the iop to test more extensively.

At the moment, I'm simply copying iop params from existing xmp files.

EDIT: Perhaps it might valuable to add a lua script to darktable that would register a keyboard shortcut to enable/disable opencl.

upegelow commented 4 years ago

After further testing I find the following:

CPU_2.6.3 == GPU_2.6.3 == CPU_3.0.rc1 != GPU_3.0.rc1

To be check if this is related: issue #3435

I did a bisect and could get close to the problem, but exactly to one commit.

559635ed6d587c61acfb215f8ef7889e52ed1af6 seems to be the last good commit. Followed by a handful of commits which do not compile here. 6578d8bf16060212bbc266a2bd3366378a6fae2c is the first bad commit I could compile.

So it looks like the issue has been generated by PR2672.

@aurelienpierre Please take over.

parafin commented 4 years ago

I think lowpass bug should have a separate issue. Don't just take over another issue like that.

parafin commented 4 years ago

Keep in mind that outputs won't be identical in many cases. It makes more sense to check similarity in some (or rather many) way.

I think something like computing the L2 and/or L_infinity norm of the difference should do

Sometimes difference between CPU and GPU is some small shift (e.g. 1 px) of the image, so I doubt one type of difference measurement will be enough to correctly understand what's happening.

parafin commented 4 years ago

@upegelow I agree. My intention was that it would be a standalone script that simply invoked darktable-cli with generated xmp files and then compared the outputs, printing some sort of score to the console. Outputs above a certain threshold would be saved for later inspection along with the generated xmp file. My prototype code is in Python. I was hoping to find a way to vary the iop parameters via darktable-cli or set them directly in the xmp file. That way, I could sweep thru a range of states of the iop to test more extensively.

At the moment, I'm simply copying iop params from existing xmp files.

darktable has introspection system which may help with this task. See #3215

TurboGit commented 4 years ago

@upegelow : so the conclusion here is that CPU is wrong where GPU is correct.

Right?

This is also what I think be I want to be sure.

BTW, just reverting 6578d8b seems to fix the issue. This has to be investigated.

upegelow commented 4 years ago

@upegelow : so the conclusion here is that CPU is wrong where GPU is correct.

Right?

Correct. Taking the output of previous darktable versions as reference.

aurelienpierre commented 4 years ago

Aurélien, creating bugs since 2018.

Your life is boring and your code is too reliable ? Together, we will find a way to turn your codebase into a nasty mess and make your life full of surprises again.

Joke aside, I will have a look at that tomorrow.

Also, I really like the idea of having tests. It has been around for a while, but nobody ever did it. Image magick has an RMSE comparison tool that seems suited, it might be worth it to test modules output independantly, then to test a full pipeline output with that.

If CPU and GPU can have outputs shifted by 1px, we could still compare after doing a correlation (Image Magick also has that: https://www.imagemagick.org/Usage/convolve/#correlate_search). Basically, if you correlate the CPU and GPU output, you will get some kernel representing the 2D shifting. By correlating again one of these outputs with this kernel, you will back-shift it and be ready to compute the RMSE between both.

TurboGit commented 4 years ago

In fact no, just reverting 6578d8b does not fix the problem. A procedural error on my side I suppose.

groutr commented 4 years ago

@parafin is there any documentation about the introspection system? Is it available via the cli?

parafin commented 4 years ago

no and no

groutr commented 4 years ago

Am I correct in thinking that the darktable:params are base64 encoded structs that correspond to the dt_iop_*_params_t structs defined in each iop?

parafin commented 4 years ago

I think so, yes. Also I think they may be compressed sometimes...

groutr commented 4 years ago
import base64, struct
haze_params = base64.decodebytes(b'cdcc4c3ecdcc4c3e')
# haze_params = b'q\xd7\x1c\xe1\xcd\xdeq\xd7\x1c\xe1\xcd\xde'
# hazeremoval struct has two floats, which should require 8 bytes
# len(haze_params) == 12 (extra 4 bytes?)
# One of the values should be 0.2
[struct.unpack_from('f', haze_params, offset=i) for i in range(8)]
# [(-1.808260165555758e+20,),
# (-472095456.0,),
# (-7.417584716930351e+18,),
# (2.206546064186184e+30,),
# (-265939224363008.0,),
# (1.4256940514007682e-21,),
# (-1.808260165555758e+20,),
# (-472095456.0,)]

What am I missing something here?

parafin commented 4 years ago

Sorry, when params are uncompressed, they are just plain hex, base64 is used only for compressed params. See https://github.com/darktable-org/darktable/blob/616f806e86f84fa60dc11abc6b679f1bd8a69e94/src/common/exif.cc#L1633

groutr commented 4 years ago

@aurelienpierre you mentioned that the CPU and GPU outputs could be shifted by 1px. Is that always in the same direction?

groutr commented 4 years ago

@parafin thanks for pointing me to encoding/decoding code. I was able to write some python functions that can successfully pack/unpack both compressed and uncompressed parameters into python objects.

groutr commented 4 years ago

what are "multi_name" and "multi_priority" in the xmp fields?

junkyardsparkle commented 4 years ago

what are "multi_name" and "multi_priority" in the xmp fields?

These are for when there are multiple instances of a particular module in the pipe.

junkyardsparkle commented 4 years ago

@aurelienpierre you mentioned that the CPU and GPU outputs could be shifted by 1px. Is that always in the same direction?

I think he was responding to @parafin, who may or may not have been thinking of #3349, which isn't a simple X or Y shift, but a difference in scaling. (platform-specific OpenCL bug, so maybe not in scope here?)

groutr commented 4 years ago

@TurboGit thanks for fixing the bilateral issue. I can confirm that it matches for CPU and GPU.

# i1, ix are cpu versions
# i2, iy are gpu versions
before = metrics.structural_similarity(i1, i2, multichannel=True)  # 0.9099741627112755
after = metrics.structural_similarity(ix, iy, multichannel=True)   # 0.9988106551960071

Converting to LAB before comparing yields (0.7616955661065257, 0.9610740923774994) for before/after respectively. Each channel is being compared separately and then averaged for the final score. I'm not sure which colorspace would be the best for making such comparisons.

It seems to me that structural similarity could be a good metric for this kind of comparison. https://en.wikipedia.org/wiki/Structural_similarity We want to target values as near to 1 as possible.

aurelienpierre commented 4 years ago

SSIM is perceptually defined, as such it would be good to compare the performance of 2 denoising methods, for example. Here, we want to ensure both outputs are technically the same, so an RMSE would be better suited I think.

TurboGit commented 4 years ago

Fixed, so closing this now.

parafin commented 4 years ago

I think lowpass bug should have a separate issue. Don't just take over another issue like that.

I’ve reopened it for a reason. This is a feature request, not a bug.

groutr commented 4 years ago

Some preliminary results:

> python test_cpu_gpu.py /home/xxxxxx/Folder/201909 /tmp/xxxxxx/TEST_GPU --xmp /home/xxxxxx/Folder/201908/20190829_172258.nef.xmp --threshold 5000
/home/xxxxxx/Folder/201909/20190904_142159.nef: 6.21097159614888 0.16577214714714714
/home/xxxxxx/Folder/201909/20190904_154407.nef: 74.8856631274775 0.34452314814814816
/home/xxxxxx/Folder/201909/20190902_233452.nef: 13.15206426932839 0.3220635635635636
/home/xxxxxx/Folder/201909/20190904_154346.nef: 27.034347519790835 0.3367885385385385
/home/xxxxxx/Folder/201909/20190904_150922.nef: 112.59202960636637 0.2199461961961962
/home/xxxxxx/Folder/201909/20190902_184505.nef: 22.343010279690567 0.39626338838838837
/home/xxxxxx/Folder/201909/20190904_001018.nef: 7.077986093313347 0.3038493493493494

The first column is RMSE score between CPU and GPU results and the second column is the percentage of values that were different (the number of r, g, or b values that were different across all pixels).

If a particular photo exceeds the threshold, then we walk thru the history:

/home/xxxxxx/Folder/201909/20190904_142159.nef: 6.21097159614888 0.16577214714714714
/home/xxxxxx/Folder/201909/20190904_154407.nef: 74.8856631274775 0.34452314814814816
/home/xxxxxx/Folder/201909/20190902_233452.nef: 13.15206426932839 0.3220635635635636
/home/xxxxxx/Folder/201909/20190904_154346.nef: 27.034347519790835 0.3367885385385385
threshold exceeeded.  Walking edit history
/home/xxxxxx/Folder/201909/20190904_150922.nef: 85.33744808726146 0.15714927427427428 (edit 0)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 1)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 2)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 3)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 4)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 5)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 6)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 83.11917541591605 0.15418080580580582 (edit 7)
/home/xxxxxx/Folder/201909/20190904_150922.nef: 112.59202960636637 0.2199461961961962 (edit 8)
/home/xxxxxx/Folder/201909/20190902_184505.nef: 22.343010279690567 0.39626338838838837
/home/xxxxxx/Folder/201909/20190904_001018.nef: 7.077986093313347 0.3038493493493494

In this case edit 8 (basecurve) caused a jump in RMSE. The xmp being applied here are the default edits applied to every photo when first opened in darkroom. The iops are: rawprepare, temperature, highlights, demosaic, colorin, colorout, gamma, flip, basecurve.

I have noticed that both of these measures are dependent on scale. Above, the outputs are scaled down so the largest dimension is 2000px. Below, there is no resizing (exported at full resolution).

/home/xxxxxx/Folder/201909/20190904_142159.nef: 14.878123738416152 0.15794990971803233
/home/xxxxxx/Folder/201909/20190904_154407.nef: 43.810817218580624 0.36411117320400427
/home/xxxxxx/Folder/201909/20190902_233452.nef: 27.547468377031297 0.3249854383923106
/home/xxxxxx/Folder/201909/20190904_154346.nef: 61.66121770961552 0.36162738879052936
/home/xxxxxx/Folder/201909/20190904_150922.nef: 9.38682675487574 0.20351098219967095
/home/xxxxxx/Folder/201909/20190902_184505.nef: 40.035051000157274 0.3679160113943206
/home/xxxxxx/Folder/201909/20190904_001018.nef: 16.139043971322728 0.2769150162595461
groutr commented 4 years ago

I've reworked the output of the script a bit. The images are now normalized on a scale of 0.0-1.0 by dividing each pixel by the apparent bitdepth. We calculate the rmse, but also return the largest squared error term computed by rmse. Both of these together have shown to be pretty indicative, in my tests, of when CPU/GPU do not visibly match. For example, the bug in #3629 was uncovered using this script. Edit 22 was the issue.

(dtxmp) ❯ python test_cpu_gpu.py /home/xxxxxx/FromFolder/201909/20190902_044048.nef /tmp/xxxxxx/TEST_GPU --xmp /home/xxxxxx/FromFolder/201909/20190902_044048.nef.xmp --threshold .01
threshold exceeeded.  Walking edit history
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 0)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 1)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 2)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 3)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 4)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 5)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0012910335570347567 0.24739755527116358 (edit 6)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0013167915313367436 0.4704348975792527 (edit 7)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0013167915313367436 0.4704348975792527 (edit 8)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 1.7351012783276262e-05 0.0018633997533470392 (edit 9)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 1.452687058810397e-05 0.00021636858582496643 (edit 10)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0005325166274044336 0.003323257900774479 (edit 11)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0005325166274044336 0.003323257900774479 (edit 12)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0006170108196186224 0.015378028387203813 (edit 13)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0006376190359774417 0.028671559179201722 (edit 14)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0006296677096021003 0.03605455532670021 (edit 15)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0006436810454362247 0.03606035024859011 (edit 16)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0005538840491146213 0.02616085112094879 (edit 17)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0005538840234838913 0.02616085112094879 (edit 18)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.00044173648199151693 0.050319219240918756 (edit 19)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0004417364584483923 0.050319219240918756 (edit 20)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.0005080307177910794 0.09262073435820639 (edit 21)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.04462973180870694 0.9999694826547056 (edit 22)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.04542565860204857 0.9876179138664156 (edit 23)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.04535952226808222 0.9716696108225733 (edit 24)
/home/xxxxxx/FromFolder/201909/20190902_044048.nef: 0.04535952225483656 0.9716696108225733 (edit 25)

Code is available here: https://github.com/groutr/darktable-xmp

github-actions[bot] commented 4 years ago

This issue did not get any activity in the past 30 days and will be automatically closed in 7 days if no update occurs. Please check if the master branch has fixed it since then.

parafin commented 4 years ago

This is a feature request, and should remain open. Maybe there's a way to mark some issues, so that they won't be affected by github bot?

github-actions[bot] commented 3 years ago

This issue did not get any activity in the past 30 days and will be closed in 365 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

johnny-bit commented 3 years ago

I think that introduction of https://github.com/darktable-org/darktable-tests essentially covers this FR.

Closing since IMO "it's implemented" :)

I hope @parafin agrees with me :)