bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.28k stars 630 forks source link

M1.M2 MacOS Users #485

Open phdykd opened 1 year ago

phdykd commented 1 year ago

I am on M2 Max Chip MacOS that has 12CPU, 38GPU, 96 GB processor, 2 TB storage. This is the issue:

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " 'NoneType' object has no attribute 'cadam32bit_grad_fp32' CUDA SETUP: Loading binary /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... dlopen(/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++ /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda115.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda111.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so /Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++

+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++

++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++ Traceback (most recent call last): File "/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/main.py", line 95, in generate_bug_report_information() File "/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/site-packages/bitsandbytes/main.py", line 66, in generate_bug_report_information lib_path = os.environ['LD_LIBRARY_PATH'].strip() File "/Users/phdykd/miniconda3/envs/Guanaco/lib/python3.10/os.py", line 680, in getitem raise KeyError(key) from None KeyError: 'LD_LIBRARY_PATH'

NorahUseringithub commented 1 year ago

I have the same issue

AiswaryaSrinivas commented 1 year ago

I have the same issue.

phdykd commented 1 year ago

Is there anyone from developer team to fix it? Or we should return M2 MacOS computers back to store and purchase NVIDIA CUDA computers?:)

deep-pipeline commented 1 year ago

So, Apple Silicon M1/M2 (ARM) support is much desired by people who want to use software which is dependent on bitsandbytes.. however looking back over the issues logs, the (sole) maintainer of the repo evidently does not have it as a priority (which may simply be a fair reflection of the priorities they have in their situation - but is clearly a big problem for others because the library has been incorporated into important ML workflows).

The code is not restrictively licensed (it's MIT) except for the portions which are PyTorch (which is BSD and indeed PyTorch now supports Apple Silicon), so licensing doesn't look like an issue. Given the voluntary work which has happened to implement MPS support (Metal Performance Shaders - the on-board Apple GPU system) for the likes of llama.cpp, it's clearly not beyond the capacity of community to self-organise to do the work necessary to give a non-Cuda dependent option - indeed there was already contribution offered by pull-request which was closed, unmerged and without comment (see #252).

I think it's fair to say at present solving bitsandbytes Apple Silicon support is in limbo.. even were an alternative implementation to be created there is the matter of getting it adopted in upstream packages which currently reference bitsandbytes... BUT, maybe, if there is simply no resolution or progress possible through this repository, given the number of ML developers working on Apple Silicon machines, the pressure may build to create a new repo 'openbitsandbytes' as a formal fork, to get basic support for Apple Silicon into that, (tracking any critical changes to bitsandbytes) and potentially to start to vocally lobby all current codebases to switch away from this repo on the basis that it's been a barrier to collective development. @rickardp any thoughts?

rickardp commented 1 year ago

I tried to ask the maintainer of this repository to clarify if M1 was explicitly out of scope after all the PRs to support Windows/Apple were closed, but I never received a response. I am positive to creating a fork, as long as there's a decent-sized community around to support it. I won't be able to sort this myself given the spare time I've got. It seems there are a few people now, maybe enough critical mass to actually get this done?

I've got the basic machinery set up in PR #257 for building for Windows/Linux/Mac on x64/arm64 both Cuda + non-Cuda. Everything on GitHub actions. That was kind of a starting point to me as I want everything built reproducably on pipelines. Also, the MPS context is set up so I think functions can be implemented now. I took inspiration from other PRs that were never merged and switch the build system to CMake as the current Makefile-based is not really maintainable for all these platforms.

Next step I would say to get MPS support solid is to have good unit test coverage on expected input/output. There exists tests, but I am not 100% sure they cover all the needed functionality to the degree that the MPS version can be implemented and we can trust it to be correct. I am not saying it is not, just not sure it is.

One task that would also be nice to get a grip on is to have a prioritized list of which functions to implement. Different use cases might use different parts of the library. I looked a few (mostly LLM) codebases, but it would be good to have a proper discussion on where to start and what we cold collaborate on to get something actually working for a real use case. Then we can get down to the actual porting I think (which didn't seem that hard actually when I attempted one of the quantization functions, just a lot of code to go through and I wasn't able to verify the correctness of the code).

I think we should strive for a compatibility fork if we decide to fork, not just a Apple silicon fork, so I'd rather keep all the CUDA implementations in the same fork, same goes for Windows compat. One problem with CUDA is that I think it requires real actual NVIDIA hardware these days (I remember being able to run CUDA in CPU mode, but it seems to have gone away from the CUDA SDK), so keeping test coverage in pipelines is going to be a problem (Azure-based NVIDIA agents are kind of out of range for an open source project). MPS can be run on the Apple build agents (at least it seems so). The reason I mention this is that the set of platforms will be disjoint (Apple is never going to support CUDA in any foreseeable future, MPS is Apple only), so contributors/maintainers will typically only be able to test locally on one platform and there will probably be a real risk of releasing non working code. Not sure how much of an issue this will be, just something to think about.

@phdykd If you are looking for a quick fix to run a codebase that depends on bitsandbytes, you are out of luck. No code in this repository will run at all on Apple platforms. The code is written specifically for CUDA. What we are discussing here is the best way forward to make this library portable, something that will require a non-trivial amount of work. Your best option is probably to look for a different implementation (or different hardware).

Edit @deep-pipeline just thought of something, would it make sense to investigate if it makes sense to donate the effort to PyTorch?

eusip commented 1 year ago

I would be interested in collaborating on a fork. I am planning to get the same spec'd Mac as the OP in a couple months so I have some incentive to get B&B running on MPS. @rickardp are you in the Huggingface Discord? Perhaps we can discuss planning in further detail over there.

phdykd commented 1 year ago

I am interested in collaborating on a fork.

matthewdouglas commented 1 year ago

I'm also interested in helping where I can. I don't have a Mac machine, but I would like to have native builds for Windows. I'm also interested in aarch64 for e.g. AWS Graviton instances, like g5g. So far I've been able to build libbitsandbytes for Windows with MSVC against CUDA 12.0, but haven't tested it much yet.

eusip commented 1 year ago

I recently picked up an M1 machine so I can definitely do testing. For me the first issue is getting a better understanding of MPS. I will do my own preliminary research in the next couple days but if anyone has any recommended resources for learning more about MPS feel free to share.

@phdykd, @deep-pipeline, @rickardp, @matthewdouglas, are you available on Discord or Slack?

matthewdouglas commented 1 year ago

@eusip I'm on the Huggingface Discord, where my username is TheUltraInstinct.

phdykd commented 1 year ago

I am on HuggingFace too. My username is phdykd. I have pretty good M2 computer. Who is taking the leadership to start? Thanks

eusip commented 1 year ago

I have a couple ideas in mind for getting around this issue.

The quick and very dirty way is developing some code based on Core ML that integrates into Transformers and Bitsandbytes. It would do a post training quantization of the PyTorch binary and then convert the model to CoreML and make that model available via the generate method in the Transformers library. Very tacky but it would work.

The other way is the long and hard way. Effectively try and swap out the CUDA calculations in the Bitsandbytes library for Metal calculations using Metal-cpp.

Update: Here is a link to PR discussing how the llama.cpp team is implementing Metal calculations. Here is a link to their implementation of k-quantization.

I think the best solution to this issue is simply to integrate the llama.cpp library. Essentially develop a Python binding. 4-bit quantization is already available.

dosier commented 1 year ago

Hey another M1 user here, I'd be interested in contributing too :)

marcothedeveloper123 commented 1 year ago

Happy to contribute — I can dedicate an Max 12CPU/38GPU 96GB — my username on Discord is mbotta

UserHIJ commented 1 year ago

python3 -m bitsandbytes

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so /Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " 'NoneType' object has no attribute 'cadam32bit_grad_fp32' CUDA SETUP: Loading binary /Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so... dlopen(/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++

+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++

++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++ Traceback (most recent call last): File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/main.py", line 95, in generate_bug_report_information() File "/Users/redacted/Library/Python/3.9/lib/python/site-packages/bitsandbytes/main.py", line 66, in generate_bug_report_information lib_path = os.environ['LD_LIBRARY_PATH'].strip() File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/os.py", line 679, in getitem raise KeyError(key) from None KeyError: 'LD_LIBRARY_PATH'

ambahk commented 1 year ago

@rickardp can you provide some instructions on building your fork on macos please? thanks https://github.com/rickardp/bitsandbytes

ambahk commented 1 year ago

OK, here's what I did to get pytest running:

Environment Setup (after cloning @rickardp 's repo and cd into the directory):

conda create -n bnb python=3.10.6
conda activate bnb
pip install -r requirements.txt

Build and install the packages

cmake .
make
python setup.py install

Run the tests

pytest tests

Some adjustments I made before the test can exit normally:

  1. Added numpy and scipy as dependency in requirement.txt
  2. some adjustments to bitandbytes/utils.py See commit here: https://github.com/ambahk/bitsandbytes/commit/2fefd2f7666022a292f04a5cb6d68232b5179241

Results As of now, pytest finishes with 3 passed, 2246 skipped in 20.61s. Gonna look at the skipped cases as the next step.

BTW, is there a better channel for "porting bitsandbytes to macos"?

rickardp commented 1 year ago

Sorry, been crazy busy with other stuff and I haven't had time to look at this. I joined the Huggingface Discord, my username is trick_ka

As for building my branch, currently I skip all the tests requiring CUDA. What I've tried to do is to set up a foundation on which relevant CUDA functionality can be ported, so I focused my effort on making this library along with its build system portable and not assume CUDA everywhere. The skipped tests are entirely by design, as they require CUDA to work. This is where a coordinated effort is required.

Everything also builds on GitHub actions and the packages are published from the actions. Myself I am using conda to separate this project from others. I think the requirements.txt should be self contained, but I might have missed something that was part of my base install (pytorch+MPS). @ambahk It was only weird that the github actions did not require those two packages. Where were the dependencies needed?

rickardp commented 1 year ago

I think the best solution to this issue is simply to integrate the llama.cpp library. Essentially develop a Python binding. 4-bit quantization is already available.

Just a comment from my side on this. I think the bitsandbytes library is widely used for a large variety projects. I don't mind taking the shortcuts needed to get the work done, but I think there is value in keeping the interface from this library to avoid having to create PRs in X other projects (that might be rejected due to the additional complexity and the fact that the vast majority of users seem to be fine with CUDA anyway). So I vote for steal as much as we can given licenses and such, but keep the API contract.

At least for me, I can run llama-cpp and it's super cool, but my goal is to play around with what is out there on my laptop, and the new stuff that appears, most of which is built on Pytorch (at least that has been my experience so far).

ambahk commented 1 year ago

The missing dependencies were required by tests/test_functional.py. Maybe the Github actions weren't running the tests.

deep-pipeline commented 1 year ago

Just popped back here after much busy-ness / distraction elsewhere and delighted to see beginnings of collective momentum.

Many thanks @rickardp for detailed positive reply and great to see @phdykd @eusip @matthewdouglas @dosier @marcothedeveloper123 input, expressions of support and suggestion of Huggingface Discord as place to chat. Also thanks @ambahk with good point re: instructions and thanks for post which has what you did with testing etc.

Without going searching Huggingface Discord to find folk, I get the feeling there is no disagreement with @rickardp that having a drop-in library interface compatible would be most useful (and least breaking) for existing projects which use bitsandbytes.

At least initially that probably means keeping same name and presumably users having to follow instructions to install the cross platform bitsandbytes manually from wherever the bitsandbytes-compatibility base repo lives. I can't help but think at some point, for reputation/security/clarity and because platform differences and enhancements may come up, it might be worth establishing a different-but-similar descriptive name, like openbitsandbytes or bitsandbytescrossplatform etc, but maybe that makes sense after some degree of publicly visible stable function of a compatibility repo.

On that point, wherever that repo lives needs to be

I think https://github.com/rickardp/bitsandbytes is the obvious place to have the new base repo - but equally I am conscious that the 'Issues' section is not enabled there, that Rickard might or might not want the 'noise' and direct personal pressure from hosting the base repo - so maybe it needs to be in a shared repo with clear leadership input from @rickardp but not necessarily all the hassle (so ability to agree PRs perhaps sitting with more people so long as a project plan is being followed?)

Anyway, that's my 'project management' collective nudge - I'm more an end-user on this front, so I'm afraid I'm unlikely to be contributing PRs other than possibly relating to some documentation, and I'm stretched too thin to be active on Discord - though am keen to follow things on GitHub (which is where group activity may naturally gather support and helpful input)

..so I do advise you all make coalesce about where GitHub repo 'base of operations is', make the readme file on that repo clearly different and turn the Issues on live there.

Best, M.

rickardp commented 1 year ago

Hi @deep-pipeline, thanks for the update. Likewise, these couple of weeks have been crazy busy for me. I'm happy to start off with my repo (https://github.com/rickardp/bitsandbytes) for collaboration.

If someone comes up with a good name for it, I can create a new organization and move it into to better indicate the status that it's not one of my personal projects but a community effort.

In the meantime, I've invited the ones who expressed interest in contributing code recently as collaborators.

I have enabled issues and discussions in the repo. If preferred over Discord, we can use GitHub discussions. I don't have a strong opinion myself. Feel free to add a discussion topic here https://github.com/rickardp/bitsandbytes/discussions/new/choose

TimDettmers commented 1 year ago

The main error that you posted (about LD_LIBRARY_PATH) has been fixed, but the main issue here is Apple Silicon support. I currently have no plans of implementing this, but I would be happy to support anyone that want to port bitsandbytes to apple silicon implementations.

BalaajiSri commented 1 year ago

I have a couple ideas in mind for getting around this issue.

The quick and very dirty way is developing some code based on Core ML that integrates into Transformers and Bitsandbytes. It would do a post training quantization of the PyTorch binary and then convert the model to CoreML and make that model available via the generate method in the Transformers library. Very tacky but it would work.

The other way is the long and hard way. Effectively try and swap out the CUDA calculations in the Bitsandbytes library for Metal calculations using Metal-cpp.

Update: Here is a link to PR discussing how the llama.cpp team is implementing Metal calculations. Here is a link to their implementation of k-quantization.

I think the best solution to this issue is simply to integrate the llama.cpp library. Essentially develop a Python binding. 4-bit quantization is already available.

interested in contribution too xD

rickardp commented 1 year ago

The main error that you posted (about LD_LIBRARY_PATH) has been fixed, but the main issue here is Apple Silicon support. I currently have no plans of implementing this, but I would be happy to support anyone that want to port bitsandbytes to apple silicon implementations.

Hi @TimDettmers! As a maintainer, do you have any input on the effort and discussion so far? My proposal was:

I fixed some of the “plumbing” on my branch, specifically:

Would this be something you would find acceptable? If you approve of this, we could baseline on a portable build/test system then the community could work incrementally by adding MPS kernels and possibly also CPU kernels (I would actually think it would be useful for this library to be able to run on CPU only).

or would you rather have one PR where it is usable straight off the bat? (Then the community effort could go on in my fork, or somewhere else).

(I am myself more of a software engineer rather than a data scientist, so I can help out with software engineering parts of the problem (for one, this means I want a simple unit test to tell me if my kernel is working or not, rather than a higher level metric etc). Though I do know a fair share of PyTorch and GPU development so I can help out with the porting where there is a clear spec.)

Also I think the community could help out with API documentation as a way of getting the spec and expected outcome.

nashid commented 1 year ago

Wondering any update in this issue.

vatdut8994 commented 1 year ago

Any updates? The latest modifications to the files were 5 months ago.

lazydreamerbliss commented 1 year ago

Would like to know if there are updates for M1/M2? Thanks

Datamance commented 1 year ago

I'm in a machine learning class and would rather train quantized models on my laptop than have to suffer with the free tier of google colab. Would love to get a status update on this!

Maverobot commented 11 months ago

I would appreciate updates on this topic.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

TimDettmers commented 10 months ago

Thank you for your patience. We are currently discussing how we integrate accelerators different from GPU such as Apple Silicon, AMD, Intel and other devices. We first need to define this integration before we can move on with the Apple integration. See this PR for more info/discussion: https://github.com/TimDettmers/bitsandbytes/pull/898