ELS-RD / kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
http://www.kernl.ai
Apache License 2.0
1.53k stars 95 forks source link

feat: add support for int8 quantization on linear layers #299

Open pommedeterresautee opened 1 year ago

pommedeterresautee commented 1 year ago

Fix #288

test pass


================================================== warnings summary ==================================================
test/test_model_optimization.py: 1 warning
test/test_torchdynamo.py: 79 warnings
  /home/geantvert/.local/share/virtualenvs/kernl/lib/python3.9/site-packages/torch/cuda/graphs.py:79: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:191.)
    super().capture_end()

test/debugger/test_memory.py::test_load_is_in_different_memory
  /mnt/workspace/kernl/test/debugger/test_memory.py:58: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
    assert t.storage().data_ptr() != a.storage().data_ptr()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================= 2885 passed, 3 skipped, 81 warnings in 10684.37s (2:58:04) =============================