fpgaminer / GPTQ-triton

GPTQ inference Triton kernel
Apache License 2.0
284 stars 23 forks source link

Get C++ when exception when trying to load model #14

Closed vedantroy closed 1 year ago

vedantroy commented 1 year ago

I'm trying to load the 13B quantized model (which I quantized using the script in this repository). But I get the following error:

Loading model ...
QuantLinear Warmup: Found 3 unique KN values.
FusedMLP Warmup: Found 1 unique K values.
Warming up autotune cache ...
  0%|                                                                                     | 0/12 [00:00<?, ?it/s]
python3: /project/lib/Analysis/Allocation.cpp:42: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(const mlir::Attribute&, const mlir::Attribute&): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
./generate.bash: line 2:  3996 Aborted                 (core dumped)
fpgaminer commented 1 year ago

This seems to be a bug in Triton (https://github.com/openai/triton/issues/1298). Do you have Triton installed at HEAD or 2.0.0?

vedantroy commented 1 year ago

This seems to be a bug in Triton (openai/triton#1298). Do you have Triton installed at HEAD or 2.0.0?

I can check. Which version should I have it installed as?

vedantroy commented 1 year ago
pip list | grep triton

gives

triton                   2.0.0
fpgaminer commented 1 year ago

Hmm, I guess try HEAD to see if they've fixed it. Otherwise, please let me know what GPU you're testing on so I can try to reproduce.

vedantroy commented 1 year ago

Works with triton 2.1.0