-
![image](https://github.com/microsoft/BitBLAS/assets/35176826/49a914bb-6be4-420a-af17-944df08d2d8d)
```
torch==2.3.0+cu121
torchaudio==2.3.0+cu121
torchvision==0.18.0+cu121
nvidia…
-
MC used to host an archive of CP/M software. It was later moved to the SIMTEL20 twenex.
There was also a mailing list, CPM at MIT-MC.
-
The code is out, it's quite simple and short
Opening this so I can track how to add this to ao and make sure it works well with torch.compile(). This will likely need blackwell to perform decently…
-
Hey folks,
I'd love to try doing a fast quantization of a [LLaVA model](https://llava-vl.github.io/) or perhaps [MOE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA). Will ExLlama work out of the…
-
The paper mentions that it's for big models, like billions of parameters, not 15 million parameters
-
-
Can you add an example of BitNet from Microsoft : https://github.com/kyegomez/BitNet ?
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests.
### Description
1.58 bit quantization i…
-
When running benchmark_inference_latency for bitnet I got this exception:
```
self.bitblas_matmul = self._get_or_create_bitblas_operator(matmul_config, ENABLE_TUNING)
^^…
-
llama 3.1 breaks on vllm with the error on --rope-scaling. The dockerfile.cpu needs to be updated to use the new IPEX patch and upgraded transformers if necessary.
[https://github.com/vllm-project/v…