darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
9.6k stars 1.13k forks source link

[export images] multiple GPUs usage #2436

Closed AxelG-DE closed 2 years ago

AxelG-DE commented 5 years ago

Describe the bug The last paragraph on this page opencl_multiple_devices hints me, multiple GPUs will be used during exporting. When I observe watch nvidia-smi only one of my two GTX1060G1 are used for exports, which hurts particularly on large scale TIFFs for panos as those export quite slowly

Christian stated about the "not supported" on pixls.us thread#12002

My priorities are set like this: opencl_device_priority=*/*/*/*

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Lighttable'
  2. Click on 'export several TIFF 16bit deflate compression=9'
  3. Open a opencl-monitor like nvidia-smi or nvidia-settings
  4. See only one GPU is used

Alternatively you can run this test, it even tells you, "GPU1 not used"

Wished behavior I saw Aurélien's post on discuss.pixls.us also in above thread, so one should be careful with "demanding" more opencl performance :-)

Hence I carefully formulate: It would be nice, when the handbook indicates this, multiple GPUs will be used for exporting :-)

Platform (please complete the following information):

github-actions[bot] commented 4 years ago

This issue did not get any activity in the past 30 days and will be closed in 7 days if no update occurs. Please check if the master branch has fixed it since then.

AxelG-DE commented 4 years ago

I still hope, this multiple GPU usage will come true one day...

aurelienpierre commented 4 years ago

What if you use opencl_device_priority=0,1,2/0,1,2/0,1,2/0,1,2 ?

EDIT: do you want to export several images at the same time, using multiple GPU, or use multiple GPU to export a single image ? Because the latter would probably not show significant speed-ups due to memory I/O bottlenecks.

AxelG-DE commented 4 years ago

Dear Arélien,

I will check this later... About your update, I understand the message. In general I would like to see all hands on deck, when I export several images, not seeing one resource sleeping

AxelG-DE commented 4 years ago

@aurelienpierre I tried to force two devices on export (third entry) and it does not work. Second card (device #1) stays idle during exporting of 42 pics

AxelG-DE commented 3 years ago

I'm still hoping :-)

AxelG-DE commented 2 years ago

EDIT: do you want to export several images at the same time, using multiple GPU, or use multiple GPU to export a single image ? Because the latter would probably not show significant speed-ups due to memory I/O bottlenecks.

I understand, it is hardly possible to use two GPU to export the same image, I wonder, if one has e.g. two discrete GPU and want to export, let's say 100 pics, can we identify the performance (don't hit me, please :smile: ) and share the pics accordingly. Let's assume, my GTX1060 would be rated as 50% as fast as my RTX2070s, by what ever measure (even I would have to do it separately and have to enter the ratio somewhere) and then we split the to be exported pics by 1/3 to GTX1060 and 2/3 to RTX2070s....

I see already, we cannot know so easily, which of the pics have harder edits (expensive in computing power, I mean) and which ones are mild and easy to export, but it would be a beginning, not to have a second GPU just sleep...

@jenshannoschwalm would you consider this idea silly? Then I shut up :)

jenshannoschwalm commented 2 years ago

I don't know atm. From my understanding we have one pipeline for exporting atm, i guess it might be possible to extend this to two.

BeckersC commented 2 years ago

Having two or three parrallel export pixelpipes would be amazing for me. I regularly export hundreds to thousands of pictures per day as part of my photogrammetry workflow. Any speedup in that would be greatly appreciated.

jenshannoschwalm commented 2 years ago

Any speedup in that would be greatly appreciated.

If you want speedup - get the fasted graphics card you can afford :-) Larger memory (8GB) is more important than clock speed btw

multiple exports in parallel would only work significantly better with two cards ...

AxelG-DE commented 2 years ago

multiple exports in parallel would only work significantly better with two cards ...

That is my point, I have two cards (1x RTX2070super 8GB & 1x GTX1060 OC 6GB) and one (GTX) stays idle

Just to reference back to initial issue :-)

AxelG-DE commented 2 years ago

i guess it might be possible to extend this to two.

I would be a happy tester :)

AxelG-DE commented 2 years ago

@jenshannoschwalm

TurboGit commented 2 years ago

multiple exports in parallel would only work significantly better with two cards ...

Sure, as some modules already are close to the limit (like Diffuse and Sharpen) having two exports in parallel is just asking for troubles. And since some modules are CPU only I'm not sure there is lot to gain, well the CPU only modules may well be the fastest where GPU was not felt to be important. But anyway, today I don't expect many of our users base to have two graphic cards, so I don't fell this is a very important project at this stage especially since there is many other things to do :)

AxelG-DE commented 2 years ago

a pity for time-lapse photographers, who always export hundreds, if not thousands of photos

TurboGit commented 2 years ago

a pity for time-lapse photographers, who always export hundreds, if not thousands of photos

Yes I understand, but the project is huge. If not done right the gain will be marginal or crash dt because of memory limit reached. It will certainly happen at some point... but don't hold your breath.

parafin commented 2 years ago

You can workaround this limitation by using in-memory or duplicate database and starting 2 instances of darktable or darktable-cli. Of course you would have to separate an export list by hand.

jenshannoschwalm commented 2 years ago

You can workaround this limitation by using in-memory or duplicate database and starting 2 instances of darktable or darktable-cli. Of course you would have to separate an export list by hand.

Did you check this in real? With two cl devices and tuned settings probably yes. Otherwise - if you do large images and don't have 32GM minimum ram there would be mem-fighting, not mentioning lots of cpu-cache misses. Not right?

parafin commented 2 years ago

This of course requires 2 GPUs and enough RAM, but this is exactly the case we’re discussing. I’m not saying it’s a good idea in general case.

jenshannoschwalm commented 2 years ago

Maybe we could start another dt process with exact cmd-line parameters while exporting?

TurboGit commented 2 years ago

Again to me this is asking for troubles if not done correctly. Currently dt is already using most CPU/GPU resources when exporting. Doing two exports at the same time won't speed up the process, it will certainly make the computer swap and slow down. The only hope is to have a parallel export with two GPU and this can be done properly only by a single dt instance otherwise one instance will certainly vampirize the power to the other.

eoyilmaz commented 2 years ago

Generally, if one have multiple GPUs they also have plenty of RAM.

In my current setup my computer has 64 GBs of RAM with 2x 3060s and a 1050 Ti (and this is a very modest setup, previously I owned 2x 3090's with 128 GB of RAM on a AMD Threadripper). I do VFX work and it is a super common thing in the VFX world that the rendering engine uses multiple GPUs at the same time. Redshift for example suggests, the double the amount of total VRAM in your system as RAM if you are going to render individual frames on each GPU. But for rendering one frame with multiple GPUs, it is generally suggested to have the double the amount of VRAM of the GPU with the highest amount of VRAM in your system as RAM.

Anyways, I would be happy to see DT is using all the resources in my computer while exporting multiple photos.

jenshannoschwalm commented 2 years ago

We only have one export pipeline (technically speaking) so one gpu for export while still developing images in darkroom, that's possible. Not your scenario atm.

BTW, what is VFX, only know VFXForth, a Forth system for embedded boards mainly. 😎

eoyilmaz commented 2 years ago

Yeah sorry, that's my bad:

VFX is Visual Effects: https://en.wikipedia.org/wiki/Visual_effects Redshift is a rendering engine running on GPU (and with 3.5 it can also run on CPU): https://www.maxon.net/en/redshift

AxelG-DE commented 2 years ago

Sorry, short OT @eoyilmaz which Mobo can hold such massive GPU setup? PCIe speed?

@TurboGit I could accept you close with "won't fix (atm)"... What do you think?

eoyilmaz commented 2 years ago

I have an ol ASRock Fatal1ty x399 mothorboard with an old AMD Threadripper 1900x which supports up to 64x PCI-e lanes, before the 2x3090s+1x1050Ti I had 4x1080Ti, sold the 3090s and now using 2x3060 12G's and the 1050Ti 4G.

TurboGit commented 2 years ago

@AxelG-DE : Ok, closing.