facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.38k stars 593 forks source link

importing `xformers.ops` implicitly initializes CUDA context #1030

Open function2-llx opened 5 months ago

function2-llx commented 5 months ago

Currently, importing xformers.ops will implicitly initializes CUDA context. This has an unpleasant effect that we cannot use the "fork" multi-processing method.

The line of code that initializes CUDA context is as follows:

https://github.com/facebookresearch/xformers/blob/f6637120b58c4b3626b18234f8c0c74c561b8d01/xformers/__init__.py#L52

danthe3rd commented 5 months ago

Hi, Thanks for reporting this issue. Unfortunately it might be more effort than just this line, as we check for device capabilities in multiple places as well... @fmassa @bottler any idea?

bottler commented 4 months ago

Fixing this would be good for cutting import times.

We need _is_triton_available to be only called when a public function is called, not at import time of public modules. I think we could do that.

bottler commented 4 months ago

It's possible that the commit 737c2e6 just now will fix this.

LucasLLC commented 3 months ago

Hello, confirming this issue is still occurring as we're seeing it locally as well in xlformers.

bottler commented 2 months ago

Possibly this will be okay now after https://github.com/facebookresearch/xformers/commit/be13e229b52d9d0bdf4422be931c67c492b8092f if you set XFORMERS_ENABLE_TRITON=1 ?

function2-llx commented 2 months ago

Possibly this will be okay now after be13e22 if you set XFORMERS_ENABLE_TRITON=1 ?

Setting this works for me with xformers v0.0.27. Thanks!