issues
search
bigcode-project
/
transformers
Apache License 2.0
26
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
add embed and residual dropout
#30
RaymondLi0
closed
9 months ago
0
For visibility: conversion scripts from fast-llm
#29
RaymondLi0
opened
10 months ago
0
Starcoder2 model
#28
jlamypoirier
opened
10 months ago
0
log tensors
#27
RaymondLi0
opened
10 months ago
0
change KV splitting based on Megatron-LM
#26
suiyoubi
closed
10 months ago
0
For visibility: Gqa megatron rope
#25
RaymondLi0
opened
11 months ago
0
Move megatron conversion script and add rope arguments
#24
loubnabnl
opened
1 year ago
4
Make modeling compatible with Nanotron + few optims
#23
NouamaneTazi
closed
9 months ago
3
For visibility: conversion scripts for fast-llm
#22
RaymondLi0
closed
10 months ago
0
Conversion of MegatronLM checkpoint to HF transformer checkpoint fails. (ALIBI used during training)
#21
gagangayari
opened
1 year ago
0
Simplified kv caching
#20
jlamypoirier
opened
1 year ago
0
Add flash attention
#19
jlamypoirier
opened
1 year ago
0
Flash attention experiments
#18
jlamypoirier
opened
1 year ago
0
Add back experimental features
#17
jlamypoirier
closed
1 year ago
0
Diff from Huggingface main
#16
jlamypoirier
opened
1 year ago
0
Transformers can no longer load santacoder-fast-inference model
#15
beale201
opened
1 year ago
0
Add gpu optimizations to base model
#14
jlamypoirier
closed
1 year ago
0
More optimizations
#13
jlamypoirier
closed
1 year ago
0
Running Santcoder-fast-inference throws UserWarning: FALLBACK path has been taken inside
#12
murthyrudra
opened
1 year ago
1
add test to ensure mqa and mha have the same behaviour
#11
minimario
closed
1 year ago
0
Upcasting, scaling, masking and fused kernels to match Megatron-LM
#10
jlamypoirier
closed
1 year ago
0
Add santacoder model
#9
jlamypoirier
closed
1 year ago
1
Megatron conversion script
#8
jlamypoirier
closed
1 year ago
0
Fast inference
#7
jlamypoirier
closed
1 year ago
0
Fork the model into GPTBigCode
#6
jlamypoirier
closed
1 year ago
1
Fast inference
#5
jlamypoirier
closed
1 year ago
0
Multi-query attention
#4
jlamypoirier
closed
1 year ago
3
Just to see the diff
#3
Muennighoff
opened
1 year ago
4
add: 2 variants of multi query implementation; printing some details
#2
bigximik
closed
1 year ago
0
Benchmark multi-query attention in HF transformers
#1
harm-devries
closed
2 years ago
1