alan-turing-institute / AIrsenal

Machine learning Fantasy Premier League team
MIT License
287 stars 86 forks source link

An os.fork() error with the multithreaded JAX #686

Open Zonkil9 opened 4 weeks ago

Zonkil9 commented 4 weeks ago

Hi, I stumbled upon an error while running the command airsenal_run_pipeline. Everything goes well until:

[...]

Fitting player model for FWD ...
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
2024-08-16 17:54:55.109484: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109483: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109485: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109483: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109958: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110125: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110294: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110774: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [1, 2, 3]:
==================================================
GK:
1. David Raya Martin, 0.00pts (£5.5m, ARS)
2. Alisson Ramses Becker, 0.00pts (£5.5m, LIV)
3. Ederson Santana de Moraes, 0.00pts (£5.5m, MCI)
4. Stefan Ortega Moreno, 0.00pts (£5.5m, MCI)
5. Emiliano Martínez Romero, 0.00pts (£5.0m, AVL)
-------------------------
DEF:
1. Trent Alexander-Arnold, 0.00pts (£7.0m, LIV)
2. Benjamin White, 0.00pts (£6.5m, ARS)
3. Gabriel dos Santos Magalhães, 0.00pts (£6.0m, ARS)
4. William Saliba, 0.00pts (£6.0m, ARS)
5. Riccardo Calafiori, 0.00pts (£6.0m, ARS)
-------------------------
MID:
1. Mohamed Salah, 0.00pts (£12.5m, LIV)
2. Cole Palmer, 0.00pts (£10.5m, CHE)
3. Bukayo Saka, 0.00pts (£10.0m, ARS)
4. Son Heung-min, 0.00pts (£10.0m, TOT)
5. Kevin De Bruyne, 0.00pts (£9.5m, MCI)
-------------------------
FWD:
1. Erling Haaland, 0.00pts (£15.0m, MCI)
2. Ollie Watkins, 0.00pts (£9.0m, AVL)
3. Alexander Isak, 0.00pts (£8.5m, NEW)
4. Kai Havertz, 0.00pts (£8.0m, ARS)
5. Ivan Toney, 0.00pts (£7.5m, BRE)
-------------------------
Prediction complete..
Generating a squad..

[...]

Additional info:

This error does not occur when I run commands one after another: airsenal_run_optimization --weeks_ahead 3 and airsenal_run_prediction --weeks_ahead 3.

Also, I installed JAX for CUDA 12.6 with

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

and of course newest CUDA 12.6 from NVIDIA repo. My GPU is NVIDIA MX450.

jack89roberts commented 4 weeks ago

Hi @Zonkil9 , thanks for reporting. It is a bit fiddly to get multiprocessing / jax / sqlalchemy playing nicely together. Does it make a difference if you run without cuda/GPU, because I also wouldn't be surprised if that causes issues (and I vaguely remember it may actually make AIrsenal run slower, it's not really optimised for GPU). The reason you may be seeing a difference between the pipeline script and the individual scripts is the pipeline defaults to using all threads available on your system, whilst the others default to 4 I think.

Zonkil9 commented 3 weeks ago

You are right - the code runs slower on GPU than on CPU. I'll just reverse to the single-threaded JAX on the CPU.

Also, I noticed a slight difference when I ran predictions for 38 fixtures. The predicted optimal players were the same, but there were around 0.5 absolute differences in points for the players.

jack89roberts commented 3 weeks ago

Also, I noticed a slight difference when I ran predictions for 38 fixtures. The predicted optimal players were the same, but there were around 0.5 absolute differences in points for the players.

This is strange, there is some randomness in the predictions but 0.5pts is quite a lot. Did you mean the difference between predicting for 3 weeks and optimising for 3 weeks vs. predicting for 38 weeks and optimising for 3 weeks, or something along those lines?

Zonkil9 commented 3 weeks ago

In order to compute on the CPU, I ran the following commands:

airsenal_update_db
export JAX_PLATFORMS=cpu
airsenal_run_prediction --weeks_ahead 37

and I got this:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.32pts (£5.5m, LIV)
2. David Raya Martin, 148.51pts (£5.5m, ARS)
3. André Onana, 144.44pts (£5.0m, MUN)
4. Bernd Leno, 134.71pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.46pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 179.78pts (£6.0m, MCI)
2. Andrew Robertson, 170.23pts (£6.0m, LIV)
3. Pedro Porro, 169.51pts (£5.5m, TOT)
4. Virgil van Dijk, 148.78pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 144.81pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 258.24pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.34pts (£9.5m, MCI)
3. Son Heung-min, 217.66pts (£10.0m, TOT)
4. Cole Palmer, 214.14pts (£10.5m, CHE)
5. Bukayo Saka, 191.27pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 269.93pts (£15.0m, MCI)
2. Alexander Isak, 204.36pts (£8.5m, NEW)
3. Kai Havertz, 168.23pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.37pts (£6.0m, FUL)
5. Ollie Watkins, 166.03pts (£9.0m, AVL)
-------------------------

In order to compute on GPU, I opened a new terminal session and ran:

airsenal_update_db
sudo nvidia-smi
airsenal_run_prediction --weeks_ahead 37

and I got this result:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.39pts (£5.5m, LIV)
2. David Raya Martin, 148.59pts (£5.5m, ARS)
3. André Onana, 143.81pts (£5.0m, MUN)
4. Bernd Leno, 134.83pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.50pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 180.07pts (£6.0m, MCI)
2. Andrew Robertson, 170.22pts (£6.0m, LIV)
3. Pedro Porro, 169.04pts (£5.5m, TOT)
4. Virgil van Dijk, 148.80pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 145.06pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 257.96pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.55pts (£9.5m, MCI)
3. Son Heung-min, 217.58pts (£10.0m, TOT)
4. Cole Palmer, 214.06pts (£10.5m, CHE)
5. Bukayo Saka, 191.23pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 270.16pts (£15.0m, MCI)
2. Alexander Isak, 204.46pts (£8.5m, NEW)
3. Kai Havertz, 168.18pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.34pts (£6.0m, FUL)
5. Ollie Watkins, 165.87pts (£9.0m, AVL)
-------------------------

As you can see, there are differences in scores for particular players, usually around 0.2 points. But they can be larger, see, e.g., Pedro Porro.

jack89roberts commented 3 weeks ago

How about between two runs of it on CPU? I can see GPU adding more randomness potentially, but it's interesting so thanks for sending!

Zonkil9 commented 3 weeks ago

The results of the two runs on the CPU are exactly the same. They are identical to those I posted above.

jack89roberts commented 3 weeks ago

Cool. that puts it in the realm of the discussion here (and elsewhere for GPUs more generally): https://github.com/google/jax/discussions/10674

Zonkil9 commented 3 weeks ago

Interesting! So I tried that:

airsenal_update_db
sudo nvidia-smi
export XLA_FLAGS=--xla_gpu_deterministic_ops=true
airsenal_run_prediction --weeks_ahead 37

And... my GPU became unimaginably slow! After 20 minutes of computations, I was just only on:

warmup:   5%| | 69/1500

I gave up for now... :laughing:

EDIT.

So finally it finished:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.54pts (£5.5m, LIV)
2. David Raya Martin, 148.70pts (£5.5m, ARS)
3. André Onana, 144.28pts (£5.0m, MUN)
4. Bernd Leno, 134.81pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.43pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 179.78pts (£6.0m, MCI)
2. Andrew Robertson, 170.45pts (£6.0m, LIV)
3. Pedro Porro, 169.20pts (£5.5m, TOT)
4. Virgil van Dijk, 149.01pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 144.86pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 258.32pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.15pts (£9.5m, MCI)
3. Son Heung-min, 217.20pts (£10.0m, TOT)
4. Cole Palmer, 214.03pts (£10.5m, CHE)
5. Bukayo Saka, 191.20pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 269.66pts (£15.0m, MCI)
2. Alexander Isak, 204.69pts (£8.5m, NEW)
3. Kai Havertz, 168.13pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.28pts (£6.0m, FUL)
5. Ollie Watkins, 166.04pts (£9.0m, AVL)
-------------------------

The results are closer to the CPU but not the same.