darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.52k stars 1.12k forks source link

stripes with opencl and denoise #17210

Open tpapp opened 1 month ago

tpapp commented 1 month ago

Describe the bug

The issue is similar to #15589 and #16378.

Using darktable master (compiled from source, Debian/testing, rocm) I get a repeated pixel band on images. I have narrowed it down to the denoise (profiled) module (see screenshot).

The band changes appearance when I zoom in/out, but remains on the right side.

Using eg RCD instead of LMMSE in demosaicing increases the band size, or moves it to a corner (similar to #16378).

Steps to reproduce

This is not image-specific, but I can upload the image and the xmp if requested (screenshot shows enabled modules).

Expected behavior

No response

Logfile | Screenshot | Screencast

screenshot_2024-07-27-123409

Commit

No response

Where did you obtain darktable from?

self compiled

darktable version

4.9.0+78~g77474ec716

What OS are you using?

Linux

What is the version of your OS?

Debian testing

Describe your system?

Integrated GPU. ROCM log attached.

rocminfo.txt

Are you using OpenCL GPU in darktable?

Yes

If yes, what is the GPU card and driver?

AMD Ryzen 5 5600H with Radeon Graphics, Vesa 7 maybe?

Please provide additional context if applicable. You can attach files too, but might need to rename to .txt or .zip

No response

jenshannoschwalm commented 1 month ago

Oh, unfortunately AMD drivers are notorious for instable drivers on certain devices and the supported models seem to change ...

At least we would require a log with '-d pipe -d opencl' options to investigate this issue.

Also you should test 'clinfo' and share it's output.

tpapp commented 1 month ago

thanks for getting back to me so quickly. attaching both the -d pipe -d opelcl and the clinfo logs.

clinfo_log.txt darktable_log.txt

tpapp commented 1 month ago

@gi-man: I get the bug at all resource settings, "small" is a remant from experimentation (was a suggestion in a similar issue).

So, if I understand correctly: this kind of integrated GPU is useless for the purposes of Darktable, it just happens to be misreported? Or can I still get some benefit out of it?

jenshannoschwalm commented 1 month ago

So, if I understand correctly: this kind of integrated GPU is useless for the purposes of Darktable

I think you will have to wait until (hopefully) AMD fixes the driver :-)

jenshannoschwalm commented 1 month ago

Anyway, thanks for the logs, some details pinpointed to #17203

da-phil commented 1 month ago

@tpapp how did you install the GPU driver in ubuntu? Are you using the amdgpu-install package?

jenshannoschwalm commented 1 month ago

Just telling: this constant flow of incoming suspected dt issues related to one OpenCL driver vendor is outstanding and driving me mad.

  1. Chances for any dt developer including me to investigate AMD OpenCL issues - leaving out rusticl as we have good support there - are diminishing.
  2. Maybe we should disable that driver by default and mark it as "use on own risk" in preferences?
  3. Or is is not the vendor but the distributions ?
da-phil commented 1 month ago

Just telling: this constant flow of incoming suspected dt issues related to one OpenCL driver vendor is outstanding and driving me mad.

1. Chances for any dt developer including me to investigate AMD OpenCL issues - leaving out rusticl as we have good support there - are diminishing.

2. Maybe we should disable that driver by default and mark it as "use on own risk" in preferences?

3. Or is is **not** the vendor but the distributions ?

@jenshannoschwalm sorry if I have triggered your AMD OpenCL pain-point again with my already closed issue and this post. I'm actually really grateful for all your amazing OpenCL subsystem contributions and support in darktable :bow:

\<offtopic>The sole purpose of my post above was to find out more about the AMD driver / compatibility situation, as I'm planning to buy a laptop with an AMD Ryzen 7 8845HS CPU with an integrated AMD GPU. If apps such as darktable cannot take advantage of the GPU processing, this would be a deal-breaker for me. But as it seems in another issue, rusticl seems to work well with those GPUs already.\</offtopic>

jenshannoschwalm commented 1 month ago

sorry if I have triggered your AMD OpenCL pain-point again ...

You didn't !

Unfortunately there is no core dev using (1) AMD hardware, not even speaking of (2) rolling release distros or (3) not-up-to-date as based on ub 22.xx. So we just get "vague" reports.

In case (2) there is almost never a chance to reproduce as it's not clear what people did in detail

Also we couldn't find a "trigger" over the last year or so. Until then it was mostly a problem of handling NaNs, AMD drivers just seem not to care. Another trigger seemed to be the interpolator for imagereadf.

I have an idea as you seem to be active on pixls too. Would you be able and interested in making sort of a review for AMD OpenCL there? Questions would be a) problems with AMD driver ? b) performance AMD driver vs rustiCL ? So we might switch to rusti as default ?

da-phil commented 1 month ago

sorry if I have triggered your AMD OpenCL pain-point again ...

You didn't !

Unfortunately there is no core dev using (1) AMD hardware, not even speaking of (2) rolling release distros or (3) not-up-to-date as based on ub 22.xx. So we just get "vague" reports.

In case (2) there is almost never a chance to reproduce as it's not clear what people did in detail

Also we couldn't find a "trigger" over the last year or so. Until then it was mostly a problem of handling NaNs, AMD drivers just seem not to care. Another trigger seemed to be the interpolator for imagereadf.

I have an idea as you seem to be active on pixls too. Would you be able and interested in making sort of a review for AMD OpenCL there? Questions would be a) problems with AMD driver ? b) performance AMD driver vs rustiCL ? So we might switch to rusti as default ?

@jenshannoschwalm of course I'm interested to help, whatever it takes to make dt run smoothly also on all common platforms, including AMD GPUs. Do we already have a pixls.us thread for collecting early feedback on experimental and pre-release candidates from people which have a broad variety of OSses and GPUs? Is it already possible to create release builds from within PRs using the github workflows? I think @darix has been doing release builds and maybe can answer this question.

tpapp commented 1 month ago

@da-phil: I have installed it from distro packages, specifically

tamas@tamas ~ % dpkg -l  '*rocm*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version                   Architecture Description
+++-===================-=========================-============-===================================================
ii  rocm-device-libs    5.2.3-3                   amd64        AMD specific device-side language runtime libraries
ii  rocm-device-libs-17 6.0+git20231212.5a852ed-2 amd64        AMD specific device-side language runtime libraries
ii  rocm-opencl-icd     5.7.1-4                   amd64        ROCm implementation of OpenCL API - ICD runtime
ii  rocminfo            5.7.1-3                   amd64        ROCm Application for Reporting System Info

@jenshannoschwalm: Is there a test suite I could run on my machine that would spit out useful information? I agree that debugging from artifacts one sees in a GUI is difficult, but a test suite could compare outputs to expected inputs.

jenshannoschwalm commented 1 month ago

Is there a test suite I could run on my machine ...

Yes we have the integration suite in dt code base that would do it. It's also running nightly reporting regressions.

I'm on Fedora, ROCM is on 6.1.2 and it still has issues ...

On dt master we check at least for available OpenCL mem and bumped the requirement to 800MB. So those cards won't start at least :-) And it doesn't hurt.

It still reports the memory as dedicated instead of shared/unified. I havent tested this change yet.

So unfortunately we can't check that (yet).

Ive seen very minimal reports/issues with dt when they use a dedicate GPU from AMD

We had lot's of them but i think we found most of the dt opencl bugs by now.

denis-martin commented 2 days ago

This is the most recent discussion I found about glitches/stripes when using OpenCL with Darktable. I'm not sure if this is exactly the same, however, here are my findings:

image

I played a lot with different parameters, and I guess I found one which makes the glitches/stripes disappear when using AMD ROCm, independent on how much graphics memory is given to Darktable: With "pinned memory" set to 1 (enforce pinned memory), I don't get the glitches/stripes! In the documentation, only performance impact was discussed, but it seems to have an effect on stability here as well.

It would be interesting to see if this also helps in your case, @tpapp