ROCm / bitsandbytes

8-bit CUDA functions for PyTorch
MIT License
34 stars 3 forks source link

TTran - bitsandbytes 0.43 PR #16

Closed TNTran92 closed 5 months ago

TNTran92 commented 6 months ago

Hello Matthews and the team,

This is my PR with all the changes.

To reproduce my result, you can use CMakeLists.txt. Note that CMakeList doesn't like the .hip extension, so you'll need to change kernels.hip-> kernels.hip.cpp, kernels.hiph -> kernels.hip.h and so on.

COMPUTE_BACKEND should be set to cpu first and then to hip. If done correctly, there will be 2 binaries in the bitsandbytes dir

  1. libbitsandbytes_cpu.so
  2. libbitsandbytes_hip_nohipblaslt.so
arlo-phoenix commented 6 months ago

Saw this PR in my feed and I worked on this before:

CMake works just fine with HIP afaik. I made CMake work with a preprocessor solution here a while ago. This is the intended way to use CMake with HIP, at least all projects I follow use it this way. The set_source_files_properties(${HIP_FILES} PROPERTIES LANGUAGE HIP) should not even be necessary and it should properly recognize .hip files under the Language HIP. I only had troubles with linking maybe that's what you mean with "doesn't like the .hip extension", don't know what version exactly but some shared libs are bugged or the llvm compiler broken in some 6.0.x version. Never figured out why though, you can hack those libs away for now (see my CMakeLists.txt).

TNTran92 commented 6 months ago

Thanks, let me try to compile this way.

On another note, let me try compiling this 0.42 branch and run some unit test. That should be better. Will let you all know when results are ready

TNTran92 commented 6 months ago

Saw this PR in my feed and I worked on this before:

CMake works just fine with HIP afaik. I made CMake work with a preprocessor solution here a while ago. This is the intended way to use CMake with HIP, at least all projects I follow use it this way. The set_source_files_properties(${HIP_FILES} PROPERTIES LANGUAGE HIP) should not even be necessary and it should properly recognize .hip files under the Language HIP. I only had troubles with linking maybe that's what you mean with "doesn't like the .hip extension", don't know what version exactly but some shared libs are bugged or the llvm compiler broken in some 6.0.x version. Never figured out why though, you can hack those libs away for now (see my CMakeLists.txt).

Just tried your CMakeLists and for some reason, I'm having issues with finding headers. I've explicitly told it where by adding target_include_directories(bitsandbytes PRIVATE ${CMAKE_SOURCE_DIR} ${CMAKE_SOURCE_DIR}/include /opt/rocm-6.0.2/include/hipblas but it's still yelling. ocm/include/hipblas is also same issue

Could it be because you're using 5.6 and I'm using 6.0?

arlo-phoenix commented 6 months ago

Could it be because you're using 5.6 and I'm using 6.0? Ignore the fork name, that's just legacy cause I named it that back then cause 5.6 was newly out and didn't think it would take this long to get ROCm upstream, I also use 6.0.x.

You should not need to target_include_directories this specific. It should be enough to cmake -D CMAKE_PREFIX_PATH=/opt/rocm-6.0.2 (ROCm libraries have full cmake support and based on that it should automatically find the .cmake files. I quickly copied over csrc from this and managed to compile on https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm_enabled_rocm_fork.

But imo it's not worth it to continue working on this rn before discussing with AMD guys. Upstream currently finalizes device abstraction and then we should discuss this after someone writes an RFC. There are a lot of decisions to be made on how to integrate HIP in the python part already (quickly tried to get it working, but since it's not just a preprocessor solution you actually need a different HipBNBNativeLibrary and a lot of other stuff that needs to be discussed).

arlo-phoenix commented 6 months ago

Hi @amathews-amd @Lzy17 @pnunna93 I don't know if you followed upstream, but the tensor based device abstraction (https://github.com/TimDettmers/bitsandbytes/pull/898) was chosen and is now in the ironing out phase.

I suggest I write an initial RFC upstream in which we can discuss the ROCm Integration. I already have a rough plan/suggestion in mind. From what I saw the important parts (optimizers, 4 bit) are ready and you're just fixing inaccuracy issues now. Since if it's just that we can already start discussing/working on python integration, documentation and the CICD/github workflows.

TNTran92 commented 6 months ago

Could it be because you're using 5.6 and I'm using 6.0? Ignore the fork name, that's just legacy cause I named it that back then cause 5.6 was newly out and didn't think it would take this long to get ROCm upstream, I also use 6.0.x.

You should not need to target_include_directories this specific. It should be enough to cmake -D CMAKE_PREFIX_PATH=/opt/rocm-6.0.2 (ROCm libraries have full cmake support and based on that it should automatically find the .cmake files. I quickly copied over csrc from this and managed to compile on https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm_enabled_rocm_fork.

But imo it's not worth it to continue working on this rn before discussing with AMD guys. Upstream currently finalizes device abstraction and then we should discuss this after someone writes an RFC. There are a lot of decisions to be made on how to integrate HIP in the python part already (quickly tried to get it working, but since it's not just a preprocessor solution you actually need a different HipBNBNativeLibrary and a lot of other stuff that needs to be discussed).

Tried commented out the include_target_library and it stilll worked. Thanks for pointing that out.

TNTran92 commented 5 months ago

Moving to https://github.com/ROCm/bitsandbytes/pull/18, closing this