-
## ❓ Questions and Help
I'm trying to implement an in-place operator using pallas, and wrap it as a torch custom op. However, I found it difficult to make it work with `torch.compile`. More specifi…
-
### Description
Hello,
I'm encountering an issue that when converting negative float16 values to int8 in pallas kernel, the conversion will fall incorrectly to zero value instead of a negative int8 …
-
I could not fine an interface for autotuning (as in Triton: https://triton-lang.org/main/python-api/generated/triton.autotune.html) in Pallas. Is there currently a way of doing this?
If not, is the…
-
Should match people's intuitions better. Maybe also change `start` to `boot` to keep with the four character commands but also maybe not.
-
After the commit https://github.com/openxla/xla/commit/2354d4a95b232d0676e4fb4a55db97404f4bf8ab, the Pallas/Triton kernel below starts returning wrong results on H100.
JAX repro:
```python
impo…
-
### 🚀 The feature, motivation and pitch
I would like to serve smaller models (e.g facebook/opt-125m) using VLLM on TPU. I can't do this currently because the Pallas backend has the limitation `NotImp…
-
### 🚀 The feature, motivation and pitch
Trying to run a Gemma 2 model on VLLM TPU gets the error not implemented for pallas backend
But searching on pallas kernel they do have support for logit s…
-
### Description
Hi. I am extending the Pallas paged attention kernel. The case is a MQA. When I run my kernel, I encountered the following error which suggests it is an internal error and I should …
-
### Description
Hi. I am extending the Pallas paged attention kernel. The case is a MQA. When I run my kernel, I encountered the following error which suggests it is an internal error and I should re…
-
## 🐛 Bug
I am attempting to implement custom Pallas kernels locally on a CPU for use with a TPU. I'm attempting to follow the official example [here](https://github.com/pytorch/xla/blob/master/docs…