b4rtaz distributed-llama issues

b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.

MIT License

1.02k stars 68 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[New Feature] Add new route for dllama api for embeding models

#96 testing0mon21 opened 1 day ago
1
refactor.

#95 b4rtaz closed 3 days ago
0
Support for GGUF files?

#94 ravor-org opened 4 days ago
1
Hugging Face models without tokenizer.model file

#93 EntusiastaIApy closed 3 days ago
2
Exception: max_seq_len is required, please update params.json with convert-llama.py on Meta-Llama-3-8B-Instruct

#92 unclemusclez closed 4 days ago
1
feat: vulkan.

#91 b4rtaz closed 3 days ago
2
feat: accelerator structure.

#90 b4rtaz closed 2 weeks ago
0
What about mobile phones?

#89 dcale opened 2 weeks ago
4
fix: windows wsa startup.

#88 b4rtaz closed 4 weeks ago
0
what(): Cannot create socket

#87 Slaghton opened 4 weeks ago
1
dllama-api invokes "what(): Invalid tokenizer file "

#86 unclemusclez closed 1 month ago
2
feat: update readme, add model.

#85 b4rtaz closed 1 month ago
0
feat: optional weights float type argument.

#84 b4rtaz closed 1 month ago
0
feat: tokenizer v1.

#83 b4rtaz closed 1 month ago
0
dllama-api hosted on 127.0.0.1

#82 unclemusclez opened 1 month ago
2
float-type f32 will not start

#81 unclemusclez opened 1 month ago
2
master and worker started but with problems

#80 fabgat opened 1 month ago
8
support multi nvidia jetson agx orin?

#79 WangFengtu1996 opened 1 month ago
3
convert into .bin

#78 fabgat closed 1 month ago
2
Request: Community Discord?

#77 unclemusclez closed 1 month ago
1
feat: add to tokenizer chat configuration.

#76 b4rtaz closed 1 month ago
5
feat: naive cache.

#75 b4rtaz closed 1 month ago
0
fix: windows fseek.

#74 b4rtaz closed 1 month ago
0
Add additional chat templates to dllama-api

#73 DifferentialityDevelopment closed 1 month ago
8
chore: refactor http request a bit.

#72 b4rtaz closed 1 month ago
0
[Feature Suggest] Config File alternative to Command Line Arguments

#71 DifferentialityDevelopment closed 1 month ago
2
Support nSlices > nKvHeads

#70 b4rtaz opened 1 month ago
0
[Feature Suggest] From All-Reduce to Ring-All-Reduce

#69 zhengpeirong opened 1 month ago
1
Support for another models (ollama models)

#68 testing0mon21 opened 1 month ago
3
[Setup] Multiple Apple Silicon Macs: Questions

#67 s04 opened 1 month ago
1
chore: dllama-api tiny clean up.

#66 b4rtaz closed 1 month ago
0
fix: chunked stream, close stream without econnreset.

#65 b4rtaz closed 1 month ago
0
feat: speed up synchronization of mlp.

#64 b4rtaz closed 1 month ago
1
feat: windows support

#63 DifferentialityDevelopment closed 1 month ago
20
feat: convert-hf.py

#62 b4rtaz closed 1 month ago
0
fix: use non-blocking sockets.

#61 b4rtaz closed 1 month ago
0
(Crashing on Low Memory SBC) main invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0

#59 unclemusclez closed 1 month ago
51
network utilization

#58 zhengpeirong opened 1 month ago
3
feat: use avx2 to speedup dotProduct

#57 b4rtaz closed 1 month ago
0
feat: use avx2 to speedup matmulF32

#56 b4rtaz closed 1 month ago
0
How To Add Suppoerted Model

#55 hyperbolic-c opened 1 month ago
2
Use AVX2 to speedup matmulQ40

#54 DifferentialityDevelopment closed 1 month ago
3
Use AVX2 to speedup matmulQ40

#53 DifferentialityDevelopment closed 1 month ago
2
Add safe tensor support to convert-llama.py

#52 DifferentialityDevelopment closed 1 month ago
10
fix: convert-llama.py supports different max_seq_len.

#51 b4rtaz closed 1 month ago
0
Vulkan Acceleration

#50 DifferentialityDevelopment opened 1 month ago
35
chore: update macbeth.sh

#49 eltociear closed 1 month ago
2
terminate called after throwing an instance of 'ReadSocketException'

#48 unclemusclez opened 1 month ago
35
API Server

#47 DifferentialityDevelopment closed 1 month ago
3
feat: splitting multihead attention into all nodes.

#46 b4rtaz closed 1 month ago
5