Open KonradDanielewski opened 10 months ago
Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?
Thanks for the info! So far we haven't had many users trying to use multiple GPUs on Windows so haven't seen this yet. Keep me posted if you figure out a solution! I wonder if using system installs of CUDA/cudnn would help?
Multi-GPU is related to NCCL (NVIDIA Collective Communications Library). There is apparently a system agnostic version available, I'll try to compile it, add to my CUDA installation and see if it works.
Techically it's not a big issue, I can just use part of the data to train the model, it should be fine anyway if I shuffle properly between all the groups.
More of an information than a bug report. Native Windows NCCL is not available via
conda-forge
(also not available according to Nvidia docs), I don't know whether there is one precompiled with CUDA or something specifically for WindowsI'll try to compile a system agnostic one and check if it works. I found this issue, cause I have a dataset of 84 recordings (45k frames each) and it doesn't fit on one 4090 - but when trying to run:
So then when running:
it throws:
Found this issue, that may be useful in implementing a solution for Windows: https://github.com/tensorflow/tensorflow/issues/21470