Open function2-llx opened 5 months ago
Hi, Thanks for reporting this issue. Unfortunately it might be more effort than just this line, as we check for device capabilities in multiple places as well... @fmassa @bottler any idea?
Fixing this would be good for cutting import times.
We need _is_triton_available to be only called when a public function is called, not at import time of public modules. I think we could do that.
Hello, confirming this issue is still occurring as we're seeing it locally as well in xlformers.
Possibly this will be okay now after https://github.com/facebookresearch/xformers/commit/be13e229b52d9d0bdf4422be931c67c492b8092f if you set XFORMERS_ENABLE_TRITON=1 ?
Possibly this will be okay now after be13e22 if you set XFORMERS_ENABLE_TRITON=1 ?
Setting this works for me with xformers v0.0.27. Thanks!
Currently, importing
xformers.ops
will implicitly initializes CUDA context. This has an unpleasant effect that we cannot use the "fork" multi-processing method.The line of code that initializes CUDA context is as follows:
https://github.com/facebookresearch/xformers/blob/f6637120b58c4b3626b18234f8c0c74c561b8d01/xformers/__init__.py#L52