bigcode-project transformers issues

bigcode-project / transformers

Apache License 2.0

26 stars 8 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

add embed and residual dropout

#30 RaymondLi0 closed 9 months ago
0
For visibility: conversion scripts from fast-llm

#29 RaymondLi0 opened 10 months ago
0
Starcoder2 model

#28 jlamypoirier opened 10 months ago
0
log tensors

#27 RaymondLi0 opened 10 months ago
0
change KV splitting based on Megatron-LM

#26 suiyoubi closed 10 months ago
0
For visibility: Gqa megatron rope

#25 RaymondLi0 opened 11 months ago
0
Move megatron conversion script and add rope arguments

#24 loubnabnl opened 1 year ago
4
Make modeling compatible with Nanotron + few optims

#23 NouamaneTazi closed 9 months ago
3
For visibility: conversion scripts for fast-llm

#22 RaymondLi0 closed 10 months ago
0
Conversion of MegatronLM checkpoint to HF transformer checkpoint fails. (ALIBI used during training)

#21 gagangayari opened 1 year ago
0
Simplified kv caching

#20 jlamypoirier opened 1 year ago
0
Add flash attention

#19 jlamypoirier opened 1 year ago
0
Flash attention experiments

#18 jlamypoirier opened 1 year ago
0
Add back experimental features

#17 jlamypoirier closed 1 year ago
0
Diff from Huggingface main

#16 jlamypoirier opened 1 year ago
0
Transformers can no longer load santacoder-fast-inference model

#15 beale201 opened 1 year ago
0
Add gpu optimizations to base model

#14 jlamypoirier closed 1 year ago
0
More optimizations

#13 jlamypoirier closed 1 year ago
0
Running Santcoder-fast-inference throws UserWarning: FALLBACK path has been taken inside

#12 murthyrudra opened 1 year ago
1
add test to ensure mqa and mha have the same behaviour

#11 minimario closed 1 year ago
0
Upcasting, scaling, masking and fused kernels to match Megatron-LM

#10 jlamypoirier closed 1 year ago
0
Add santacoder model

#9 jlamypoirier closed 1 year ago
1
Megatron conversion script

#8 jlamypoirier closed 1 year ago
0
Fast inference

#7 jlamypoirier closed 1 year ago
0
Fork the model into GPTBigCode

#6 jlamypoirier closed 1 year ago
1
Fast inference

#5 jlamypoirier closed 1 year ago
0
Multi-query attention

#4 jlamypoirier closed 1 year ago
3
Just to see the diff

#3 Muennighoff opened 1 year ago
4
add: 2 variants of multi query implementation; printing some details

#2 bigximik closed 1 year ago
0
Benchmark multi-query attention in HF transformers

#1 harm-devries closed 2 years ago
1