clarify openMP (and torch and CUDA and...) version requirements

ar4 / deepwave

Wave propagation modules for PyTorch.

MIT License

202 stars 46 forks source link

This is maybe a bug report but more likely an improvement request.

In a pretty clean venv on windows 10, I installed pytorch latest and deepwave latest. No errors. Importing and running pytorch stuff is also no error. However, when I run the simple demo code, the jupyter kernel crashes and says this:


Disposing session as kernel process died ExitCode: 3, Reason: OMP: Error #15: Initializing libiomp5md.dll, but found
libomp140.x86_64.dll already initialized.  OMP: Hint This means that multiple copies of the OpenMP runtime have been
linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do
is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP
runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable
KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently
produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.```

Indeed, deepwave works like a charm with that duplicate lib ok flag set.  `libomp140.x86_64.dll` is almost certainly from pytorch so I 
guess `libiomp5md.dll` must be loaded by deepwave.  I assume deepwave *used to* work out-of-the-box most of the time, so does
it need an older pytorch--or do I really have to recompile it and figure out what openmp flags to set to select the right openmp
library version?

Hi @mikehenninger and thank you for what certainly does seem to be a bug report. I am grateful to you for taking the time to send it, and sorry that you encountered a problem trying to use Deepwave.

You are right that Deepwave is compiled using libiomp5md.dll on Windows and that it used to work, but PyTorch on Windows now seems to be using a different version of OpenMP that is conflicting with that. To fix it I would need to recompile Deepwave using that other OpenMP version, but I am reluctant to do that as I think that would break it for people using older versions of PyTorch. This is an unfortunate problem of using compiled code that I hope I might be able to resolve for good in the future by rewriting Deepwave entirely in PyTorch. PyTorch's recent performance improvements might make this possible in the next few months.

The quickest fix for you is probably to switch to an older version of PyTorch, if you can. I haven't been able to work out when PyTorch made this switch, and I unfortunately do not have access to a Windows computer to check it, but I see that there are quite a few reports of other people having similar problems with OpenMP in PyTorch since the 2.4 release, so you might only need to go back to 2.3.1. If you want to be safe, though, the last version tested with Deepwave was 2.0.1.

If you would prefer to stay with the latest version of PyTorch, then you could try compiling Deepwave yourself. If you want to try that then I can provide instructions.

Apologies again for the hassle.

ar4 / deepwave

clarify openMP (and torch and CUDA and...) version requirements #76