issues
search
HomebrewNLP
/
Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
46
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
feat(optimizer): add svd fisher
#106
ClashLuke
opened
1 year ago
0
Arch2
#105
ClashLuke
opened
1 year ago
1
fix(optimizer): run all ops in fp64
#104
ClashLuke
closed
1 year ago
0
Remove dead code
#103
ClashLuke
closed
1 year ago
3
Fp32
#102
ClashLuke
closed
1 year ago
0
Better shampoo splitting
#101
ClashLuke
closed
1 year ago
0
Don't decay mixer
#100
ClashLuke
closed
1 year ago
0
Adam square
#99
ClashLuke
closed
1 year ago
1
No flatten depth
#98
ClashLuke
closed
1 year ago
1
Improve Shampoo
#97
ClashLuke
closed
1 year ago
0
Flatten Conv for Shampoo, FP64 inverse
#96
ClashLuke
closed
1 year ago
1
fix(shampoo): don't debias stat
#95
ClashLuke
closed
1 year ago
0
fix(optimizer): debias in correct direction
#94
ClashLuke
closed
1 year ago
1
Mean teacher
#93
ClashLuke
closed
1 year ago
1
More normv2
#92
ClashLuke
closed
1 year ago
0
Looped pool
#91
ClashLuke
closed
1 year ago
2
Tests for ctx.parameters
#90
ClashLuke
closed
1 year ago
1
feat(model): single branch revnet
#89
ClashLuke
closed
1 year ago
0
Weight-Tie MoE
#88
ClashLuke
closed
2 years ago
1
Dense2
#87
ClashLuke
closed
2 years ago
1
Moe tree
#86
ClashLuke
closed
2 years ago
2
Moe2
#85
ClashLuke
closed
2 years ago
4
log stats + fix first checkpoint/resume
#84
ClashLuke
closed
2 years ago
2
Cleanup backend
#83
ClashLuke
closed
2 years ago
0
Looks linear
#82
ClashLuke
closed
2 years ago
1
Multiple forward per backward
#81
ClashLuke
opened
2 years ago
0
Staged batchsize training
#80
ClashLuke
opened
2 years ago
0
Compact Loss
#79
ClashLuke
opened
2 years ago
0
Causality Test
#78
ClashLuke
opened
2 years ago
1
LpNorm + ScaleNorm
#77
ClashLuke
closed
2 years ago
1
Fix checkpoint
#76
ClashLuke
closed
2 years ago
0
Hierarchical mixer
#75
ClashLuke
closed
2 years ago
1
test(model): no qrnn
#74
ClashLuke
closed
2 years ago
1
Scan
#73
ClashLuke
closed
2 years ago
1
L1 LayerNorm
#72
ClashLuke
closed
2 years ago
1
Scale
#71
ClashLuke
closed
2 years ago
19
Square LR-Schedule
#70
ClashLuke
opened
2 years ago
0
SM3 instead of Adam in Adam#Shampoo
#69
ClashLuke
closed
2 years ago
0
add managed training, use tpucare for sweep
#68
ClashLuke
closed
2 years ago
0
high-performance multi-gpu video2tfrecord
#67
ClashLuke
closed
2 years ago
1
reduce epsilon yet improve stability
#66
ClashLuke
closed
2 years ago
0
perf(optimizer/shampoo): remove multi-preconditioning
#65
ClashLuke
closed
2 years ago
1
feat(optimizer): use normalized rmsprop
#64
ClashLuke
closed
2 years ago
1
faet(model): use more stable l2norm
#63
ClashLuke
closed
2 years ago
1
perf(optimizer): use adam+shampoo
#62
ClashLuke
closed
2 years ago
1
Learning-rate schedule as beta schedule
#61
ClashLuke
closed
2 years ago
1
MuP Normalization
#60
ClashLuke
closed
2 years ago
1
MESA/SAM
#59
ClashLuke
closed
2 years ago
0
Self-Convolution
#58
ClashLuke
closed
2 years ago
2
Add configurable layer scales
#57
ClashLuke
closed
2 years ago
1
Next