facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.54k stars 363 forks source link

AttributeError: 'Model' object has no attribute '_allocated_ait_data' #78

Open mingqizhang opened 1 year ago

mingqizhang commented 1 year ago

I use the latest cuda docker with A100, when I run python3 examples/05_stable_diffusion/compile.py --token xxx, the main error code as follow:

57 errors detected in the compilation of "flash_attention_10.cu". make: [Makefile:9: flash_attention_10.obj] Error 1 make: Waiting for unfinished jobs....

2022-11-11 03:11:49,781 INFO compiled the final .so file elapsed time: 0:00:08.439418 Traceback (most recent call last): File "examples/05_stable_diffusion/compile.py", line 373, in compile_diffusers() File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke return __callback(args, **kwargs) File "examples/05_stable_diffusion/compile.py", line 349, in compile_diffusers compile_clip( File "examples/05_stable_diffusion/compile.py", line 252, in compile_clip compile_model(Y, target, "./tmp", "CLIPTextModel", constants=params_ait) File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 260, in compile_model module = Model( File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 227, in init self.DLL = self._DLLWrapper(lib_path, num_runtimes, allocator_kind) File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 166, in init self.DLL = ctypes.cdll.LoadLibrary(lib_path) File "/usr/lib/python3.8/ctypes/init.py", line 451, in LoadLibrary return self._dlltype(name) File "/usr/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: ./tmp/CLIPTextModel/test.so: cannot open shared object file: No such file or directory Exception ignored in: <function Model.del at 0x7ff88f4ef3a0> Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 257, in del self.close() File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 261, in close for ptr in list(self._allocated_ait_data): AttributeError: 'Model' object has no attribute '_allocated_ait_data'

mikeiovine commented 1 year ago

The real root cause is that it can't find the compiled model:

OSError: ./tmp/CLIPTextModel/test.so: cannot open shared object file: No such file or directory

This likely indicates that there was a compile error, can you share the full logs?

(Aside: in the next release, we should probably fix the exception handling in model.py so the OSError becomes more prominent when this happens...)

mingqizhang commented 1 year ago

@mikeiovine Hello, here is the full logs: CompileErrorLog.txt

mikeiovine commented 1 year ago

Looks like something in cutlass is failing to compile, can you share your compiler version, cutlass version, CUDA version, etc?

mingqizhang commented 1 year ago

Looks like something in cutlass is failing to compile, can you share your compiler version, cutlass version, CUDA version, etc?

My gcc version is 9.4.0, GNU make version is 4.2.1, cmake version is 3.16.3, CUDA version is 11.6, cutlass version maybe is 2.10 in 3rdparty/cutlass/, and compile in docker.

ffahmed commented 1 year ago

I am getting exactly same error. If I can skip the "compile_clip", the other two "compile_unet" and "compile_vae" compiles fine and generates test.so. and I have exactly same error as @mingqizhang . Any update on this ?

Purvak-L commented 1 year ago

I'm facing the same error as @mingqizhang & @ffahmed on A100.

sudharshankakumanu commented 1 year ago

@mikeiovine I see the same issue on an A10G.

ybai62868 commented 1 year ago

I meet the same issue on 3060 and 3090!