SasView / sasmodels

Package for calculation of small angle scattering models using OpenCL.
BSD 3-Clause "New" or "Revised" License
15 stars 27 forks source link

MPFit fails for single precision models (was: C models do not work in 5.0.5 if GPU enabled) #518

Closed smk78 closed 1 year ago

smk78 commented 2 years ago

Whilst following the 'basic 1d fitting' tutorial, user Hubert K noticed that SasView would not fit the mono_gauss_coil model if a GPU was enabled, but that the poly_gauss_coil model would work. I can replicate the issue.

Hubert & I were both running SasView 5.0.5 on Windows using different GPUs (me: AMD Oland; Hubert: Intel UHD or NVIDIA GeForce) but my tests suggest this issue was not present in 5.0.4 and has been introduced in 5.0.5.

One difference between these models is that poly_gauss_coil is a pure Python model, but mono_gauss_coil calls C code.

mono_gauss_coil passes the GPU tests, but when you try and fit with it nothing happens.

The Log Explorer reports:

10:44:55 - INFO:  --- SasView session started, version 5.0.5, 2022 ---
10:44:55 - INFO: Python: 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
10:46:07 - INFO: C:\SasView-5.0.5\tinycc-data\tcc.exe -shared -rdynamic -Wall C:\Users\*****\AppData\Local\Temp\sas64_mono_gauss_coil_F2DB80A0_1y8lu3ro.c -o C:\Users\*****\.sasview\compiled_models\sas64_mono_gauss_coil_F2DB80A0.so
10:46:19 - INFO: 2022-09-01 10:46:19 === Steps: 5 of 200  chisq: 1.6  ETA: 2s
  M1.background: 0.072      |       M1.i_zero: 49.7       |           M1.rg: 56.5
.
.
10:48:17 - INFO: building mono_gauss_coil-float32-F2DB80A0 for OpenCL Oland
10:48:18 - INFO: build program: kernel 'mono_gauss_coil_Iq' was part of a lengthy source build resulting from a binary cache miss (1.39 s)
10:48:18 - INFO: build program: kernel 'mono_gauss_coil_Iqxy' was part of a lengthy source build resulting from a binary cache miss (1.39 s)
10:48:18 - INFO: build program: kernel 'mono_gauss_coil_Imagnetic' was part of a lengthy source build resulting from a binary cache miss (1.39 s)
10:48:19 - INFO: 2022-09-01 10:48:19 === Steps: 1 of 200  chisq: 395  ETA: 5m 19s
  M2.background: 0.001      |       M2.i_zero: 70         |           M2.rg: 75

The mono_gauss_coil model fits fine if not using a GPU.

krzywon commented 2 years ago

This is even more of an issue than suggested here. I just tried multiple models and found a number of c-models that are failing in the same way. Pure python models seem to be unaffected.

Models tested (so far):

krzywon commented 2 years ago

The base sphere and cylinder models are also failing. This will need a fix soon followed by a release. I'll take it and branch off the latest sasview and sasmodels release points to try and find the issue.

krzywon commented 2 years ago

Looking deeper, this is only affected by the MPFit (new Levenburg-Marquardt) fit algorithm. All other fit algorithms return appropriate results when using a GPU.

smk78 commented 2 years ago

Oh. My idle Googling says that MPFit is a CPU-based library and is not parallelised. Did we miss that?

butlerpd commented 2 years ago

Interesting. That does seem odd. @pkienzle or @bmaranville do you know if that is true?

bmaranville commented 2 years ago

I'm afraid this is outside my wheelhouse.

pkienzle commented 2 years ago

The mpfit code is pure python. I included it directly in bumps. Even if it were in C, it is evaluated through the python callback, so that wouldn't preclude sending it out to the GPU for calculation.

Off hand I have no suggestions why it might be failing. It would help if there were a traceback. I haven't tried reproducing the error yet.

I could parallelize the Jacobian calculation, asking for all partial derivatives at the same time, but the code does not do so yet. It may help for 1D SANS on the GPU though I'm not sure; the different threads may end up tripping over each other when they are transferring data between CPU and GPU. For 2D SANS it may be counterproductive to parallelize the calls to fit since the GPU is completely busy evaluating all the pixels in the SANS image.

Pure python models will not call out to the GPU. Many of these models may spend more time transferring data between CPU and GPU than they would spend calculating directly on the CPU, so this may not be a problem. If it would run faster on GPU, someone could merge the automatic python to C pull request and tag the model to allow it to be translated.

krzywon commented 2 years ago

I've been testing a few possible ways to fix this. Here are my results.

Potential solutions:

pkienzle commented 2 years ago

I suspect the problem is with the derivative calculation. OpenCL is using single precision, but MPFit is using double precision for the numerical derivative. So as far as the model calculator is concerned f(x) = f(x + Δx) and so (f(x+ Δx) - f(x))/Δx = 0 and no step is taken. We will need to go into the depths of the mpfit code and change how it computes the derivative.

pkienzle commented 2 years ago

The next release of bumps will fix this problem. You can check by doing a direct install from the repo:

python -m pip install git+https://github.com/bumps/bumps.git
smk78 commented 1 year ago

This issue is resolved in 5.0.6rc1 (beta1) so am closing this.