issues
search
AnswerDotAI
/
bert24
Apache License 2.0
66
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Rescaling LR for non-constant schedulers
#128
warner-benjamin
closed
3 weeks ago
0
Add reset_time configuration option
#127
warner-benjamin
closed
3 weeks ago
0
Allow empty split for train or eval only datasets
#126
warner-benjamin
closed
3 weeks ago
0
Add optimizer configuration to eval
#125
ohmeow
closed
1 day ago
1
Modify the LambdaLR scheduler when overriding the LR & WD
#124
warner-benjamin
closed
1 month ago
0
Allow setting a new LR & WD on resumption
#123
warner-benjamin
closed
1 month ago
0
Remove callbacks from restart_override
#122
warner-benjamin
closed
1 month ago
0
Add restart override flag to allow changing schedule, callbacks, and microbatch_size
#121
warner-benjamin
closed
1 month ago
0
Live eval support for pre-training runs
#120
rbiswasfc
closed
4 weeks ago
0
Initialize model from Checkpoint
#119
warner-benjamin
closed
2 months ago
0
add 1-sqrt scheduler for decay experiments
#118
staghado
closed
2 months ago
0
Added code to strip padding from packed sequences.
#117
fladhak
closed
3 weeks ago
1
Strip padding from packed sequences
#116
fladhak
closed
2 months ago
0
allow skipping an eval dataset
#115
warner-benjamin
closed
2 months ago
0
Small benchmark improvements
#114
warner-benjamin
closed
2 months ago
0
Add missing tokens to bos & eos
#113
warner-benjamin
closed
2 months ago
0
composer 0.24.1
#112
warner-benjamin
closed
2 months ago
0
Sequence packing
#111
fladhak
closed
2 months ago
2
Distributed Sampling Dataset
#110
warner-benjamin
closed
2 months ago
4
Update installation instruction to change conda channel_priority
#109
fladhak
closed
2 months ago
0
Add support for gradient norm logging
#108
warner-benjamin
closed
2 months ago
3
Update env and install instructions for FA3 & PyTorch 2.4
#107
warner-benjamin
closed
2 months ago
0
Update Env
#106
warner-benjamin
closed
2 months ago
0
Fixing the deletion option in the decompression file
#105
NohTow
closed
2 months ago
0
Flash Attention 3 Support
#104
warner-benjamin
closed
2 months ago
1
Masked prediction
#103
warner-benjamin
closed
2 months ago
3
PyTorch 2.4 with dynamic shape DDP Compile
#102
warner-benjamin
closed
2 months ago
3
Add support for Gemma-2 style Tanh Softcapping
#101
warner-benjamin
opened
3 months ago
0
Different RoPE settings for local attention layers
#100
warner-benjamin
closed
3 months ago
1
Unpad embeddings & model support for sequence packing
#99
warner-benjamin
closed
3 months ago
1
Add support for mds without predefined attention mask
#98
ohallstrom
closed
3 months ago
1
Allow to disable training metrics
#97
ohallstrom
closed
2 months ago
3
Weight init with non-default tokenizer bug fix
#96
ohallstrom
closed
3 months ago
9
Add support for local attention
#95
warner-benjamin
closed
3 months ago
1
Update Eval Scripts
#94
warner-benjamin
closed
3 months ago
4
Use wandb_entity for eval logging
#93
warner-benjamin
closed
3 months ago
0
Add token classification eval with CoNLL 2003
#92
tylerjthomas9
opened
4 months ago
1
add tokenizer converter script
#91
bclavie
opened
4 months ago
0
Surface recompute_metric_loss option
#90
warner-benjamin
closed
4 months ago
0
Adding allow_embedding_resizing to mosaic_bert
#89
NohTow
closed
4 months ago
0
Adding MLMMLU & Ultrafeedback jobs
#88
rbiswasfc
closed
3 months ago
4
Add Z-Loss support, use flash_attn CrossEntropy, don't recalculate CrossEntropy metric if unneeded
#87
warner-benjamin
closed
4 months ago
1
Surface cache_limit option for streaming dataloaders
#86
warner-benjamin
closed
4 months ago
0
Add dataset without streaming
#85
ohallstrom
closed
3 months ago
1
Fix initialization
#84
warner-benjamin
closed
4 months ago
0
simple benchmarking script
#83
warner-benjamin
closed
4 months ago
1
WIP: Auto eval config
#82
bclavie
closed
4 months ago
0
Add support for loading pre-tokenized data
#81
staghado
closed
4 months ago
6
Add support for FA2 deterministic mode
#80
warner-benjamin
closed
4 months ago
0
Add missing bias config options to attention Linear layers
#79
warner-benjamin
closed
4 months ago
1
Next