bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.31k stars 634 forks source link

LLM.int8() Refactoring: Part 1 #1401

Open matthewdouglas opened 4 weeks ago

matthewdouglas commented 4 weeks ago

This PR is the initial phase of a set of changes aimed at improving the LLM.int8() implementation.

Still in draft at the moment, but since there's a lot here I'm ready to have eyes on it. @TimDettmers @Titus-von-Koeller

Primary Purpose

Enhancements

Deprecations

The following functions from bitsandbytes are deprecated:

mm_cublas
bmm_cublas
matmul_cublas

The following functions from bitsandbytes.functional are deprecated:

_mul
arange
dequant_min_max
dequantize_no_absmax
extract_outliers
get_special_format_str
get_tensor_stream (moved to internal API)
get_transform_buffer
get_transform_func
mm_dequant (replacement: int8_mm_dequant)
igemmlt (replacement: int8_linear_matmul)
nvidia_transform
post_call
pre_call
transform
quantize_no_absmax
vectorwise_dequant
vectorwise_quant (~replacement: int8_vectorwise_quant)
vectorwise_mm_dequant (~replacement: int8_mm_dequant)

Further testing and benchmarking will be coming. At the moment unit tests are passing.

Next steps

github-actions[bot] commented 4 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Titus-von-Koeller commented 3 weeks ago

cc @akx as this is a high impact which we're currently in the process of reviewing to be released as soon as we can; feel free to chime in if it's of interest to you, we would really appreciate your feedback.

Titus-von-Koeller commented 2 weeks ago

Hey @matthewdouglas,

Thanks again for the insightful two-hour pairing session – it was great to walk through your code together. I’m impressed by your thoughtful review and the careful attention to detail in this work. There was a lot of complexity to handle pragmatically and I love the incremental refactoring approach that you took. The performance improvements are also really impressive. Great work!


Here's my feedback that I collected during our talk:

  1. Organize Test Scripts
    Consider moving scripty test parts under bitsandbytes/scripts/8-bit. Adding a reference to that in the main implementation would help guide developers to these “eval” scripts for future refactoring.

  2. Clarify absmax Logic
    In get_row_absmax, please add an explanation about why absmax only over rows is sufficient.

  3. Commentary in MatMul8bitLt
    You mentioned needing a comment in MatMul8bitLt – could you clarify the specific addition required here?

  4. Documenting Public Functions
    Ensure all public functions have clear, detailed docstrings and verify their proper rendering in the documentation.

  5. Deterministic Test Inputs
    It makes a lot of sense hard-coded test inputs to improve consistency over the prior approach of using randomization. Please make sure that this is true for all 8-bit related tests before concluding this PR. However, a follow-up PR applying this to other tests would help address ongoing flakiness and would be highly appreciated.

  6. Profiling Code Placement
    Please commit your profiling code to the main repo in a reasonable location and/or move more experimental/supplementary code to the workbench repo for future team reference.

  7. Benchmark Transparency for Users
    Adding benchmark results to the documentation would greatly benefit users, especially in a “deep-dive” section. Please clearly highlight performance comparisons with 16-bit, underscoring benefits with large context and batch sizes, where overhead remains constant. H100 benchmarks could add value but might be low priority. Focus on takeaways from performance, giving users accessible insights from your mental model, so they “know what we know”.

  8. Publicity-Worthy Performance Metrics
    Do we have any benchmark metrics from this refactor that might serve as release highlights?


Big thanks also to @akx for making time to review this work! We really appreciate your proactive contributions and helpful insights 🤗 Thanks ❤️