1.One error should be solved
when install apex, there will be 4 erors about "convert unsigned long to long", you need to edit:
(1) line 65 in apex_22.01_pp/csrc/mlp.cpp
auto reserved_space = at::empty({reserved_size}, inputs[0].type());
change to:
auto reserved_space = at::empty({static_cast<long>(reserved_size)}, inputs[0].type());
(2) line 138 in apex_22.01_pp/csrc/mlp.cpp
auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());
change to:
auto work_space = at::empty({static_cast<long>(work_size / sizeof(scalar_t))}, inputs[0].type());
or you need to change the compile option
2.one improvement for reducing the CUDA memory
when launch the owl_demo.py using a GPU with 16G, I ran into a CUDA memory overflow error. Then I edit here:
line 33 and 34 in interface.py:
model = model.to(device)
model = model.to(dtype)
change to:
model = model.to(dtype)
model = model.to(device)
Then, After the demo is started, the memory usage is about 14 GB. It can run very well on a 16GB GPU.
1.One error should be solved when install apex, there will be 4 erors about "convert unsigned long to long", you need to edit: (1) line 65 in apex_22.01_pp/csrc/mlp.cpp
auto reserved_space = at::empty({reserved_size}, inputs[0].type());
change to:auto reserved_space = at::empty({static_cast<long>(reserved_size)}, inputs[0].type());
(2) line 138 in apex_22.01_pp/csrc/mlp.cpp
auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());
change to:auto work_space = at::empty({static_cast<long>(work_size / sizeof(scalar_t))}, inputs[0].type());
or you need to change the compile option
2.one improvement for reducing the CUDA memory when launch the owl_demo.py using a GPU with 16G, I ran into a CUDA memory overflow error. Then I edit here: line 33 and 34 in interface.py:
change to:
Then, After the demo is started, the memory usage is about 14 GB. It can run very well on a 16GB GPU.