issues
search
AI-Hypercomputer
/
jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Apache License 2.0
41
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Support passing custom sampling function.
#200
wang2yn84
opened
15 hours ago
0
Enable jax compilation flags for jpt
#199
vivianrwu
closed
3 weeks ago
0
Add jax compilation cache config
#198
vivianrwu
closed
3 weeks ago
0
Add model warmup flag into cli
#197
vivianrwu
closed
3 weeks ago
0
Fix: correct quantization name filtering
#196
tengomucho
closed
1 month ago
0
Add per request sampling support.
#195
wang2yn84
opened
1 month ago
0
feat: add quantize exclude layer flag
#194
tengomucho
closed
1 month ago
2
fix: correct error message
#193
tengomucho
opened
1 month ago
0
add local tokenizer option for automated testing without hf token
#192
sixiang-google
closed
1 month ago
1
Add an option to not quantize embedding layer when doing quantization.
#191
qihqi
closed
1 month ago
0
Delete convert_checkpoints and helper classmethods.
#190
qihqi
closed
1 month ago
0
Fix ray recompilation and accuracy
#189
sixiang-google
closed
1 month ago
0
Make jpt the default cli - remove other entry point scripts
#188
qihqi
closed
1 month ago
2
Add model warmup and jax compilation cache flags
#187
vivianrwu
closed
2 months ago
0
Fix too many positional arguments lint error
#186
FanhaiLu1
closed
2 months ago
0
[Feature Request] Per request sampling params
#185
qihqi
opened
2 months ago
3
Switch to NP from Jax to improve attention manager performance
#184
FanhaiLu1
closed
2 months ago
1
Make sure the server does not crash if the input is too long
#183
qihqi
opened
2 months ago
0
[RFC] Formalizing commandline arguments.
#182
qihqi
opened
2 months ago
0
Add offline perf ci
#181
qihqi
closed
2 months ago
6
Support End To End PagedAttention in JetStream
#180
FanhaiLu1
closed
2 months ago
0
Pa decode checkin 1
#179
FanhaiLu1
closed
2 months ago
0
Update README for new CLI
#178
qihqi
closed
2 months ago
0
Update Jetstream, add optional sampler args.
#177
qihqi
closed
3 months ago
0
Add gemma support in better cli
#176
qihqi
closed
3 months ago
0
Use kwargs to simplify the call sites a bit
#175
yixinshi
closed
3 months ago
0
Add mixtral support to new CLI
#174
qihqi
closed
3 months ago
0
Issues with prefill & generate
#173
qihqi
opened
3 months ago
0
Fix the performance regression with ragged attention on for llama2 7b.
#172
wang2yn84
closed
3 months ago
2
Replace repeat kv with proper GQA handling.
#171
wang2yn84
closed
3 months ago
3
fix ray engine crashes on multihost
#170
sixiang-google
closed
3 months ago
0
Error Running `run_ray_serve_interleave` with Llama3 8B
#169
ryanaoleary
opened
3 months ago
0
Add a script to measure speed of basic ops
#168
qihqi
closed
3 months ago
0
Add page attention manager and kvcache manager
#167
FanhaiLu1
closed
3 months ago
0
Add page attention manager and kvcache manager
#166
FanhaiLu1
closed
3 months ago
0
Fix TPU head resource name for v4 and v5e
#165
richardsliu
closed
4 months ago
0
Fix Ray engine crash on multihost
#164
richardsliu
closed
4 months ago
0
Fixed exhausted bug between head and workers
#163
FanhaiLu1
closed
4 months ago
0
Handle v5e-8 in run_ray_serve_interleave
#162
richardsliu
closed
4 months ago
0
Update Ray version in Dockerfile and add v5 configs
#161
richardsliu
closed
4 months ago
0
Add newest llama-3 benchmarks
#160
qihqi
closed
4 months ago
0
V5e8 ray
#159
FanhaiLu1
closed
4 months ago
0
Return np instead of jax array for prefill result tokens
#158
FanhaiLu1
closed
4 months ago
0
Correct typo enbedding -> embedding
#157
tengomucho
closed
4 months ago
1
commit act quant for conditional ffn
#156
qihqi
opened
4 months ago
0
Stacked cache mixtral.
#155
wang2yn84
closed
4 months ago
0
Stacked cache for MLPerf
#154
wang2yn84
closed
4 months ago
0
Add mlperf benchmark for offline for mixtral
#153
qihqi
closed
4 months ago
2
Set accumulate type to bf16 in activation quant
#152
lsy323
closed
4 months ago
1
Optimize cache update.
#151
wang2yn84
closed
3 months ago
7
Next