FenTechSolutions / CausalDiscoveryToolbox

Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
https://fentechsolutions.github.io/CausalDiscoveryToolbox/html/index.html
MIT License
1.12k stars 197 forks source link

SAM, Running in parallel #161

Open patfl84 opened 6 months ago

patfl84 commented 6 months ago

Hi,

I'm running SAM with nruns = 8. I'm testing with a small dataset, and a very small portion of my GPU memory is being used, but my processes are running in serial rather than parallel.

It seems like all other processes are stalled while one process is executing on the GPU (even though there is more GPU memory available), then the next process goes to the GPU, and so on.

As you can see below only the first process detects the GPU:

  0%|          | 0/4000 [00:00<?, ?it/s, disc=0.308, gen=-.431, regul_loss=0.032, tot=-8.59]Detecting 1 CUDA device(s).
  1%|          | 26/4000 [00:03<06:14, 10.60it/s, disc=0.0149, gen=-.915, regul_loss=0.032, tot=-18.3]No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  1%|▏         | 58/4000 [00:06<06:26, 10.19it/s, disc=0.00128, gen=-1.05, regul_loss=0.032, tot=-20.9]No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  2%|▏         | 90/4000 [00:09<06:02, 10.79it/s, disc=-.00934, gen=-1.01, regul_loss=0.032, tot=-20.1]No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  3%|▎         | 120/4000 [00:12<06:01, 10.74it/s, disc=-.0134, gen=-1, regul_loss=0.026, tot=-20]     No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  4%|▍         | 152/4000 [00:15<06:40,  9.60it/s, disc=-.0153, gen=-1, regul_loss=0.028, tot=-20]No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  5%|▍         | 182/4000 [00:18<05:54, 10.76it/s, disc=-.0189, gen=-1, regul_loss=0.03, tot=-20] No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
  7%|▋         | 296/4000 [00:29<06:02, 10.23it/s, disc=-.0287, gen=-1, regul_loss=0.022, tot=-20.1]Process Process-9: