crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.21k stars 371 forks source link

k-diffusion triggering of torch.compile/torch.multiprocessing leaves multiple child processes #81

Closed vladmandic closed 9 months ago

vladmandic commented 10 months ago

torch.compile triggered here https://github.com/crowsonkb/k-diffusion/blob/f4a74f1ec906cb62916f58288ec73ef0330ba446/k_diffusion/models/image_transformer_v1.py#L89-L92

has a very bad side-effect of triggering torch.multiprocessing since it executes on cpu. as a result, torch will start cpu cores number of child processes (on my system its 32 child python processes).

traceback looks like:

Process ForkProcess-2:
Process ForkProcess-3:
Process ForkProcess-8:
Process ForkProcess-6:
Process ForkProcess-1:
...
KeyboardInterrupt
  File "/usr/lib/python3.11/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^

simply setting K_DIFFUSION_USE_COMPILE=0 env variable disables compile and issue is gone. but default behavior is more than suspect - i suggest to revisit this.

crowsonkb commented 9 months ago

I have fixed this in my next development branch by deferring the compiles until something actually tries to use the compiled kernel and will backport it soon. :)

crowsonkb commented 9 months ago

Fixed in https://github.com/crowsonkb/k-diffusion/commit/8400fa935efc7b990c797ee01161078f7165cd29.