AI-Hypercomputer jetstream-pytorch issues

AI-Hypercomputer / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

Apache License 2.0

41 stars 15 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Support passing custom sampling function.

#200 wang2yn84 opened 15 hours ago
0
Enable jax compilation flags for jpt

#199 vivianrwu closed 3 weeks ago
0
Add jax compilation cache config

#198 vivianrwu closed 3 weeks ago
0
Add model warmup flag into cli

#197 vivianrwu closed 3 weeks ago
0
Fix: correct quantization name filtering

#196 tengomucho closed 1 month ago
0
Add per request sampling support.

#195 wang2yn84 opened 1 month ago
0
feat: add quantize exclude layer flag

#194 tengomucho closed 1 month ago
2
fix: correct error message

#193 tengomucho opened 1 month ago
0
add local tokenizer option for automated testing without hf token

#192 sixiang-google closed 1 month ago
1
Add an option to not quantize embedding layer when doing quantization.

#191 qihqi closed 1 month ago
0
Delete convert_checkpoints and helper classmethods.

#190 qihqi closed 1 month ago
0
Fix ray recompilation and accuracy

#189 sixiang-google closed 1 month ago
0
Make jpt the default cli - remove other entry point scripts

#188 qihqi closed 1 month ago
2
Add model warmup and jax compilation cache flags

#187 vivianrwu closed 2 months ago
0
Fix too many positional arguments lint error

#186 FanhaiLu1 closed 2 months ago
0
[Feature Request] Per request sampling params

#185 qihqi opened 2 months ago
3
Switch to NP from Jax to improve attention manager performance

#184 FanhaiLu1 closed 2 months ago
1
Make sure the server does not crash if the input is too long

#183 qihqi opened 2 months ago
0
[RFC] Formalizing commandline arguments.

#182 qihqi opened 2 months ago
0
Add offline perf ci

#181 qihqi closed 2 months ago
6
Support End To End PagedAttention in JetStream

#180 FanhaiLu1 closed 2 months ago
0
Pa decode checkin 1

#179 FanhaiLu1 closed 2 months ago
0
Update README for new CLI

#178 qihqi closed 2 months ago
0
Update Jetstream, add optional sampler args.

#177 qihqi closed 3 months ago
0
Add gemma support in better cli

#176 qihqi closed 3 months ago
0
Use kwargs to simplify the call sites a bit

#175 yixinshi closed 3 months ago
0
Add mixtral support to new CLI

#174 qihqi closed 3 months ago
0
Issues with prefill & generate

#173 qihqi opened 3 months ago
0
Fix the performance regression with ragged attention on for llama2 7b.

#172 wang2yn84 closed 3 months ago
2
Replace repeat kv with proper GQA handling.

#171 wang2yn84 closed 3 months ago
3
fix ray engine crashes on multihost

#170 sixiang-google closed 3 months ago
0
Error Running `run_ray_serve_interleave` with Llama3 8B

#169 ryanaoleary opened 3 months ago
0
Add a script to measure speed of basic ops

#168 qihqi closed 3 months ago
0
Add page attention manager and kvcache manager

#167 FanhaiLu1 closed 3 months ago
0
Add page attention manager and kvcache manager

#166 FanhaiLu1 closed 3 months ago
0
Fix TPU head resource name for v4 and v5e

#165 richardsliu closed 4 months ago
0
Fix Ray engine crash on multihost

#164 richardsliu closed 4 months ago
0
Fixed exhausted bug between head and workers

#163 FanhaiLu1 closed 4 months ago
0
Handle v5e-8 in run_ray_serve_interleave

#162 richardsliu closed 4 months ago
0
Update Ray version in Dockerfile and add v5 configs

#161 richardsliu closed 4 months ago
0
Add newest llama-3 benchmarks

#160 qihqi closed 4 months ago
0
V5e8 ray

#159 FanhaiLu1 closed 4 months ago
0
Return np instead of jax array for prefill result tokens

#158 FanhaiLu1 closed 4 months ago
0
Correct typo enbedding -> embedding

#157 tengomucho closed 4 months ago
1
commit act quant for conditional ffn

#156 qihqi opened 4 months ago
0
Stacked cache mixtral.

#155 wang2yn84 closed 4 months ago
0
Stacked cache for MLPerf

#154 wang2yn84 closed 4 months ago
0
Add mlperf benchmark for offline for mixtral

#153 qihqi closed 4 months ago
2
Set accumulate type to bf16 in activation quant

#152 lsy323 closed 4 months ago
1
Optimize cache update.

#151 wang2yn84 closed 3 months ago
7