AnswerDotAI bert24 issues

AnswerDotAI / bert24

Apache License 2.0

66 stars 4 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Rescaling LR for non-constant schedulers

#128 warner-benjamin closed 3 weeks ago
0
Add reset_time configuration option

#127 warner-benjamin closed 3 weeks ago
0
Allow empty split for train or eval only datasets

#126 warner-benjamin closed 3 weeks ago
0
Add optimizer configuration to eval

#125 ohmeow closed 1 day ago
1
Modify the LambdaLR scheduler when overriding the LR & WD

#124 warner-benjamin closed 1 month ago
0
Allow setting a new LR & WD on resumption

#123 warner-benjamin closed 1 month ago
0
Remove callbacks from restart_override

#122 warner-benjamin closed 1 month ago
0
Add restart override flag to allow changing schedule, callbacks, and microbatch_size

#121 warner-benjamin closed 1 month ago
0
Live eval support for pre-training runs

#120 rbiswasfc closed 4 weeks ago
0
Initialize model from Checkpoint

#119 warner-benjamin closed 2 months ago
0
add 1-sqrt scheduler for decay experiments

#118 staghado closed 2 months ago
0
Added code to strip padding from packed sequences.

#117 fladhak closed 3 weeks ago
1
Strip padding from packed sequences

#116 fladhak closed 2 months ago
0
allow skipping an eval dataset

#115 warner-benjamin closed 2 months ago
0
Small benchmark improvements

#114 warner-benjamin closed 2 months ago
0
Add missing tokens to bos & eos

#113 warner-benjamin closed 2 months ago
0
composer 0.24.1

#112 warner-benjamin closed 2 months ago
0
Sequence packing

#111 fladhak closed 2 months ago
2
Distributed Sampling Dataset

#110 warner-benjamin closed 2 months ago
4
Update installation instruction to change conda channel_priority

#109 fladhak closed 2 months ago
0
Add support for gradient norm logging

#108 warner-benjamin closed 2 months ago
3
Update env and install instructions for FA3 & PyTorch 2.4

#107 warner-benjamin closed 2 months ago
0
Update Env

#106 warner-benjamin closed 2 months ago
0
Fixing the deletion option in the decompression file

#105 NohTow closed 2 months ago
0
Flash Attention 3 Support

#104 warner-benjamin closed 2 months ago
1
Masked prediction

#103 warner-benjamin closed 2 months ago
3
PyTorch 2.4 with dynamic shape DDP Compile

#102 warner-benjamin closed 2 months ago
3
Add support for Gemma-2 style Tanh Softcapping

#101 warner-benjamin opened 3 months ago
0
Different RoPE settings for local attention layers

#100 warner-benjamin closed 3 months ago
1
Unpad embeddings & model support for sequence packing

#99 warner-benjamin closed 3 months ago
1
Add support for mds without predefined attention mask

#98 ohallstrom closed 3 months ago
1
Allow to disable training metrics

#97 ohallstrom closed 2 months ago
3
Weight init with non-default tokenizer bug fix

#96 ohallstrom closed 3 months ago
9
Add support for local attention

#95 warner-benjamin closed 3 months ago
1
Update Eval Scripts

#94 warner-benjamin closed 3 months ago
4
Use wandb_entity for eval logging

#93 warner-benjamin closed 3 months ago
0
Add token classification eval with CoNLL 2003

#92 tylerjthomas9 opened 4 months ago
1
add tokenizer converter script

#91 bclavie opened 4 months ago
0
Surface recompute_metric_loss option

#90 warner-benjamin closed 4 months ago
0
Adding allow_embedding_resizing to mosaic_bert

#89 NohTow closed 4 months ago
0
Adding MLMMLU & Ultrafeedback jobs

#88 rbiswasfc closed 3 months ago
4
Add Z-Loss support, use flash_attn CrossEntropy, don't recalculate CrossEntropy metric if unneeded

#87 warner-benjamin closed 4 months ago
1
Surface cache_limit option for streaming dataloaders

#86 warner-benjamin closed 4 months ago
0
Add dataset without streaming

#85 ohallstrom closed 3 months ago
1
Fix initialization

#84 warner-benjamin closed 4 months ago
0
simple benchmarking script

#83 warner-benjamin closed 4 months ago
1
WIP: Auto eval config

#82 bclavie closed 4 months ago
0
Add support for loading pre-tokenized data

#81 staghado closed 4 months ago
6
Add support for FA2 deterministic mode

#80 warner-benjamin closed 4 months ago
0
Add missing bias config options to attention Linear layers

#79 warner-benjamin closed 4 months ago
1