ROCm / bitsandbytes

8-bit CUDA functions for PyTorch
MIT License
38 stars 4 forks source link

fix kEstimateQuantiles kernel #7

Closed pnunna93 closed 9 months ago

pnunna93 commented 9 months ago

This PR adds synchronization point in kEstimateQuantiles kernel, without which there is a race condition and the kernel fails on Instinct gpus.

The fix enables these tests: test_estimate_quantiles test_quantile_quantization test_kbit_quantile_estimation