CovertLab / wcEcoli

Whole Cell Model of E. coli
Other
18 stars 3 forks source link

Slow JIT compilation on Linux #1400

Closed thalassemia closed 8 months ago

thalassemia commented 8 months ago

I'm running into a weird issue where the Numba JIT compilation of the differential equations in equilibrium (main culprit) and two-component system takes about 3 minutes every time I start a simulation with runSim.py. I've confirmed the issue on three different Linux systems (Ubuntu 22.04 x86, Arch x86, and Ubuntu 22.04 x86 via WSL 2) and @ggsun has also noticed a lag time of several minutes at the beginning of each simulation on his Linux desktop. Interestingly, @cyrus-bio and @rjuenemann have not experienced any such delay on their ARM Macbook Pros. Not sure how to proceed with this.

1fish2 commented 8 months ago

Debugging ideas:

1fish2 commented 8 months ago

If the --no-jit switch avoids the long delay, we can focus on Numba. Experiment ideas:

  1. Set the environment variable NUMBA_OPT (a number in [0 .. 3] inclusive) to a smaller number than whatever the default is. This controls the amount of optimization work. It could at least provide a workaround.
  2. Test on the latest release, numba==0.58.1. In the best case, this will fix the problem, and there'd be no need for the following experiments.
  3. Test on numba==0.55.0 which wcEcoli used until 5/18/2023. This could demonstrate that it is a recent Numba bug and enable filing an informed bug report.
  4. Test on numba==0.50.1 which wcEcoli used until 11/2/2022. This version might not work on Python 3.11 and numpy 1.24 since version 0.53.0 added Python 3.9 support and 0.54.0 added numpy version 1.20 support, so it's probably more practical to set up the older pyenv than to just go back to numba 0.50.1.
  5. Maybe useful: 0.56.0 added an experimental option "to the @jit family decorators to entirely turn off LLVM’s optimisation passes for a given function (see _dbg_optnone kwarg in the @jit decorator family)."
  6. Test on other versions of the libraries that Numba requires: llvmlite and numpy.

The Numba release notes don't seem to mention anything related to this problem except that version 0.52.0 "focuses on performance improvements."

I skipped over the CUDA support parts. Is Numpy using CUDA on your Linux machines, @thalassemia and @ggsun? Numpy 0.53.0 added cuda.is_supported_version() to check if the CUDA runtime version is supported.

thalassemia commented 8 months ago

I can confirm that setting --no-jit eliminates the 3-minute delay at the start of each simulation. Unfortunately, upgrading to numba==0.58.1 does not resolve the issue. Coincidentally, I first discovered the Numpy multinomial casting bug after an ill-fated attempt to solve this issue by upgrading every package to its latest version.

Interestingly, when I set up a Python 3.10.12 environment with the current requirements.txt on the latest commit of master, I experienced the same delay. However, the same environment with Python 3.9.17 has no delay. The CUDA function you mentioned results in an error in both environments. The plot thickens...

1fish2 commented 8 months ago

OK. Maybe NUMBA_OPT=2 or =1 will function as a workaround.

Can you make a smallish reproducible test case to file a Numba issue? We can snapshot the derivatives_parca_symbolic and derivatives_parca_jacobian_symbolic matrices going into the Numba jit compiler. Whether the bug is in Numba or Python 3.10, the Numba team will be better at debugging it.

1fish2 commented 8 months ago

My Intel i9 MacBook Pro gets the same symptom: about a 4 minute delay for JIT compilation.

NUMBA_OPT=2 or =1 did not help. NUMBA_OPT=0 roughly halved the JIT compilation time.

--no-jit saved the JIT compilation time, produced identical sim output, and did not slow down the rest of the sim run.

@rjuenemann Does runSim.py run faster on your M1 Mac than unSim.py --no-jit? Maybe the bug depends on the CPU architecture.

no-jit

$ python runscripts/manual/runSim.py --no-jit
Simulation finished:
 - Sim length: 0:42:13
 - Sim end time: 0:42:13
 - Runtime: 0:24:33

Sat Nov  4 16:41:40 2023: Elapsed time 1478.19 sec (0:24:38.191285); CPU 1469.37 sec

jit

$ python runscripts/manual/runSim.py
 - Sim length: 0:42:13
 - Sim end time: 0:42:13
 - Runtime: 0:28:23

Sat Nov  4 17:54:42 2023: Elapsed time 1708.37 sec (0:28:28.365383); CPU 1701.29 sec

Proposal

rjuenemann commented 8 months ago

Hi @1fish2,

It looks like --no-jit actually runs faster for me as well.

jit

python runscripts/manual/runSim.py test_jit
Simulation finished:
 - Sim length: 0:42:11
 - Sim end time: 0:42:11
 - Runtime: 0:10:20
Mon Nov  6 10:40:21 2023: Elapsed time 623.01 sec (0:10:23.009824); CPU 622.70 sec

no-jit

python runscripts/manual/runSim.py test_no_jit --no-jit
Simulation finished:
 - Sim length: 0:42:11
 - Sim end time: 0:42:11
 - Runtime: 0:09:11
Mon Nov  6 10:54:59 2023: Elapsed time 553.53 sec (0:09:13.529028); CPU 553.19 sec
1fish2 commented 8 months ago

Thanks, @rjuenemann I'll create a PR to make no-jit the default.

In our Wednesday meeting let's discuss whether to drop Numba. That'd simplify the code and library version interdependencies. Numba has open issues reported in 2020 on slow jit compilation and exponential jit compile time.

rjuenemann commented 8 months ago

Sounds good, thank you!

1fish2 commented 8 months ago

FYI, a trick to run an identical sim to another output subdirectory without needing to rerun the Parca is to use a wildtype variant like runSim.py --variant wildtype 1 1. Wildtype variants are no-ops. (I simplified the command line in the earlier post for readability.)

rjuenemann commented 8 months ago

Ooo this is helpful - thank you!