One error should be solved and one improvement for reducing the CUDA memory

hongfangyu commented 1 year ago

1.One error should be solved when install apex, there will be 4 erors about "convert unsigned long to long", you need to edit: (1) line 65 in apex_22.01_pp/csrc/mlp.cpp auto reserved_space = at::empty({reserved_size}, inputs[0].type()); change to: auto reserved_space = at::empty({static_cast<long>(reserved_size)}, inputs[0].type());

(2) line 138 in apex_22.01_pp/csrc/mlp.cpp auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type()); change to: auto work_space = at::empty({static_cast<long>(work_size / sizeof(scalar_t))}, inputs[0].type());

or you need to change the compile option

2.one improvement for reducing the CUDA memory when launch the owl_demo.py using a GPU with 16G, I ran into a CUDA memory overflow error. Then I edit here: line 33 and 34 in interface.py:

    model = model.to(device)
    model = model.to(dtype)

change to:

    model = model.to(dtype)
    model = model.to(device)

Then, After the demo is started, the memory usage is about 14 GB. It can run very well on a 16GB GPU.

hongfangyu commented 1 year ago

Just a reminder. My env is V100, gcc-7, cuda11.7, python3.10

MAGAer13 commented 1 year ago

Nice work!

LovingThresh commented 1 year ago

Without changine the code, 4090 will run into a CUDA memory overflow error So, Thank you!

X-PLUG / mPLUG-Owl

One error should be solved and one improvement for reducing the CUDA memory #22