pg.setThreadCount() not working for full-inversions

soehag commented 2 months ago

Problem description

Hello everyone. While running some large scale ERT inversions with the ERT-manager on a shared computing cluster, I noticed that limiting the ThreadCounts doesnt work for the whole inversion. It appears, that the forward modelling (calculating the response) correctly respects the set thread count whereas the calculation of the jacobian would always allocate as much CPUs as the system has. I reproduced the example on my local machine.

Your environment

Date: Thu Feb 22 18:38:29 2024 CET

            OS : Linux
        CPU(s) : 8
       Machine : x86_64
  Architecture : 64bit
           RAM : 23.3 GiB
   Environment : Jupyter
   File system : ext4

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]

       pygimli : 1.4.3
        pgcore : 1.4.0
         numpy : 1.24.4
    matplotlib : 3.7.2
         scipy : 1.11.2
       IPython : 8.14.0
       pyvista : 0.41.1

Steps to reproduce

Tell us how to reproduce this issue. Ideally, you could paste the code that produces the error:

import pygimli as pg
import pygimli.meshtools as mt
import numpy as np
import pygimli.physics.ert as ert

mesh_inner_plc = pg.meshtools.createWorld(np.array([10,4]), np.array((20, -4)), worldMarker=True, marker=1)
mesh_inner_plc.setBoundaryMarkers([-2]*4)
mesh_inner = pg.meshtools.createMesh(mesh_inner_plc, quality=32, area=.1, smooth=True)
final_mesh=mesh_inner
pg.show(final_mesh, markers=True, showMesh=True)

### Create data
sensorPositions_x = np.linspace(12,18,11)
sensorPositions_top = np.vstack((sensorPositions_x, np.ones_like(sensorPositions_x)*2)).T
sensorPositions_bot = np.vstack((sensorPositions_x, np.ones_like(sensorPositions_x)*(-2))).T
scheme = pg.DataContainerERT()
for pos in sensorPositions_top:
    scheme.createSensor(pos)
for pos in sensorPositions_bot:
    scheme.createSensor(pos)
for sen_top in range(len(sensorPositions_top)):
    for sen_bot in range(len(sensorPositions_top),len(sensorPositions_top)+len(sensorPositions_bot)):
        scheme.addFourPointData(sen_top,-1,sen_bot,-1)
scheme["k"] +=1
res = np.array([h.center().y() for h in final_mesh.cells()])**2+100
pg.show(final_mesh, res)

syn_data = ert.simulate(mesh=final_mesh, scheme=scheme, res=res, noiseLevel=1)
pg.show(final_mesh, markers=True, showMesh=True)

### Invert with ERT manager
sM=100
pg.setThreadCount(1)
manager = ert.Manager()
manager.setData(np.random.random)
manager.setMesh(final_mesh)

manager.inv.setRegularization(1, cType=1)

result = manager.invert(
    data=syn_data,
    # mesh=final_mesh,
    lam =1e2,
    startModel=sM,
    verbose=True
)

Expected behavior

Set thread-count should be valid for all calculations.

Actual behavior

Calculation of jacobian matrix does display, all 8 CPUS were used. For the response no message is available in the logs. A log for the response would not be bad either.

ModellingBase::setMesh() copying new mesh ... Found datafile: 22 electrodes
Found: 22 free-electrodes
rMin = 0.3, rMax = 14.4222
NGauLeg + NGauLag for inverse Fouriertransformation: 10 + 4
Found non-Neumann domain
0.00574216 s
FOP updating mesh dependencies ... 1.519e-06 s
Calculating response for model: min = 100 max = 115.381
Allocating memory for primary potential...... 0.000341261

No primary potential for secondary field calculation. Calculating analytically...
Forward: time: 0.0587708s
Response: min = 1.22867 max = 2.19441 mean = 1.89735
Reciprocity rms(modelReciprocity) 0.0936967%, max: 0.240693%
relativeError set to a value > 0.5 .. assuming this is a percentage Error level dividing them by 100
Data error estimate (min:max)  0.010045556265465905 : 0.010081375519050945
min/max(dweight) = 99.1928/99.5465
fop: <pygimli.physics.ert.ertModelling.ERTModelling object at 0x7fc95314b0b0>
Data transformation: <pgcore._pygimli_.RTransLogLU object at 0x7fc95304c400>
Model transformation (cumulative):
     0 <pgcore._pygimli_.RTransLogLU object at 0x7fc9530b00b0>
min/max (data): 1.22/2.23
min/max (error): 1%/1.01%
min/max (start model): 100/100
--------------------------------------------------------------------------------
use model trans from RegionManager
Calculating response for model: min = 100 max = 100
Allocating memory for primary potential...... 0.000319546

No primary potential for secondary field calculation. Calculating analytically...
Forward: time: 0.25425s
Response: min = 1.10435 max = 1.99086 mean = 1.71883
Reciprocity rms(modelReciprocity) 0.0226251%, max: 0.0574602%
min/max(dweight) = 99.1928/99.5465
Building constraints matrix
constraint matrix of size(nBounds x nModel) 1943 x 1325
check Jacobian: wrong dimensions: (0x0) should be (121x1325)  force: 1
jacobian size invalid, forced recalc: 1
calculating jacobian matrix (forced=1)...Using existing subpotentials for createJacobian.
S(8/8-std::mt): 0.00383503:--------------------------------------------------------------------------------
inv.iter 1 ... time: 0.154716s
sens sum: median = 0.0057454 min = 0.000623663 max = 0.00761138
... 0.15539 s
min data = 1.22399 max data = 2.22806 (121)
min error = 0.0100456 max error = 0.0100814 (121)
min response = 1.10434 max response = 1.99045 (121)
calc without reference model
0: rms/rrms(data, response) = 0.180075/9.45747%
0: chi^2(data, response, error, log) = 97.7891
0: Phi = 11832.5 + 0 * 100 = 11832.5
solve CGLSCDWWtrans with lambda = 100
Calculating response for model: min = 96.8319 max = 170
Using existing primary potentials.
Forward: time: 0.229352s
Response: min = 1.6761 max = 2.79545 mean = 2.39831
Reciprocity rms(modelReciprocity) 0.215618%, max: 0.479104%
1: LS newModel: min = 96.8319; max = 170
1: LS newResponse: min = 1.67591; max = 2.7955
1: rms/rrms(data, LS newResponse) = 0.502896/27.2081%
1: chi^2(data, LS newResponse, error, log) = 569.388
1: Phi = 68895.9+0.68641*100=68964.6
Linesearch tau = 0.29
chi² = 1.37 (dPhi = 98.55%) lam: 100.0
--------------------------------------------------------------------------------
inv.iter 2 ... Calculating response for model: min = 99.0707 max = 116.635
Using existing primary potentials.
Forward: time: 0.259476s
Response: min = 1.25115 max = 2.20239 mean = 1.89725
Reciprocity rms(modelReciprocity) 0.0749854%, max: 0.17477%
1: Model: min = 99.0707; max = 116.635
1: Response: min = 1.25099; max = 2.2021
1: rms/rrms(data, Response) = 0.0209835/1.18273%
1: chi^2(data, Response, error, log) = 1.36744
1: Phi = 165.46+0.0577271*100=171.233
calculating jacobian matrix (forced=1)...Using existing subpotentials for createJacobian.
S(8/8-std::mt): 0.000313396:time: 0.145199s
sens sum: median = 0.0057778 min = 0.000550327 max = 0.00758198
... 0.145634 s
solve CGLSCDWWtrans with lambda = 100
Calculating response for model: min = 101.138 max = 114.765
Using existing primary potentials.
Forward: time: 0.229263s
Response: min = 1.20551 max = 2.18297 mean = 1.88091
Reciprocity rms(modelReciprocity) 0.036792%, max: 0.101361%
chi² = 1.02 (dPhi = 25.85%) lam: 100.0
--------------------------------------------------------------------------------
inv.iter 3 ... 2: LS newModel: min = 101.138; max = 114.765
2: LS newResponse: min = 1.20521; max = 2.18239
2: rms/rrms(data, LS newResponse) = 0.023634/1.27493%
2: chi^2(data, LS newResponse, error, log) = 1.63483
2: Phi = 197.814+0.0467649*100=202.49
Linesearch tau = 0.43
Calculating response for model: min = 100.122 max = 113.561
Using existing primary potentials.
Forward: time: 0.255258s
Response: min = 1.23182 max = 2.18877 mean = 1.89056
Reciprocity rms(modelReciprocity) 0.0518674%, max: 0.125255%
2: Model: min = 100.122; max = 113.561
2: Response: min = 1.2316; max = 2.1886
2: rms/rrms(data, Response) = 0.019234/1.01547%
2: chi^2(data, Response, error, log) = 1.02324
2: Phi = 123.813+0.0315547*100=126.968
calculating jacobian matrix (forced=1)...Using existing subpotentials for createJacobian.
S(8/8-std::mt): 0.0002949:time: 0.146085s
sens sum: median = 0.0057631 min = 0.000579157 max = 0.00759286
... 0.146901 s
solve CGLSCDWWtrans with lambda = 100
Calculating response for model: min = 100.733 max = 114.217
Using existing primary potentials.
Forward: time: 0.237843s
Response: min = 1.23196 max = 2.20825 mean = 1.90776
Reciprocity rms(modelReciprocity) 0.0406618%, max: 0.0949382%
chi² = 0.88 (dPhi = 13.26%) lam: 100.0

################################################################################
#                  Abort criterion reached: chi² <= 1 (0.88)                   #
################################################################################
3: LS newModel: min = 100.733; max = 114.217
3: LS newResponse: min = 1.2317; max = 2.20899
3: rms/rrms(data, LS newResponse) = 0.0203922/1.09641%
3: chi^2(data, LS newResponse, error, log) = 1.16932
3: Phi = 141.488+0.0457302*100=146.061
Linesearch tau = 0.41
Calculating response for model: min = 100.372 max = 113.249
Using existing primary potentials.
Forward: time: 0.239325s
Response: min = 1.23192 max = 2.19578 mean = 1.89762
Reciprocity rms(modelReciprocity) 0.0463173%, max: 0.111579%
3: Model: min = 100.372; max = 113.249
3: Response: min = 1.23169; max = 2.19565
3: rms/rrms(data, Response) = 0.0174767/0.947599%
3: chi^2(data, Response, error, log) = 0.881958
3: Phi = 106.717+0.0341166*100=110.129

carsten-forty2 commented 2 months ago

It seems that the sensitivity calculation ignores the global thread count and uses its own local thread setting.

please add: ert.fop._core.setThreadCount(1) which sets both .. the global and the local thread count setting.

Multi threading (MT) for the forward calculation comes only from the OpenBlas implementation which is used by the cholmod solver.

If you want this MT for the forward calculation again, you can set the global thread count again with pg.setThreadCount(1)

We will resort and clean this if we have some more time ... until then, hopefully the manual settings might help.

Cheers, Carsten

soehag commented 2 months ago

Manual settings definetly do for now! Thank you for the quick answer - works like a charm ;)

Cheers Hagen

gimli-org / gimli