issues
search
AnswerDotAI
/
bert24
Apache License 2.0
25
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add support for FA2 deterministic mode
#80
warner-benjamin
opened
1 day ago
0
Add missing bias config options to attention Linear layers
#79
warner-benjamin
closed
1 day ago
1
Adding the function to compute the actual number of non padding tokens
#78
NohTow
closed
1 day ago
2
global_eval_batch_size bug fix
#77
warner-benjamin
closed
4 days ago
0
Add custom trainer with support for batch rampup
#76
ohallstrom
opened
5 days ago
1
Add support for ablation evaluation
#75
rbiswasfc
closed
2 days ago
2
Add support for device_eval_microbatch_size
#74
warner-benjamin
closed
4 days ago
2
Fix eval global and device bs
#73
warner-benjamin
closed
5 days ago
0
Current progress of dataset creation
#72
orionw
closed
4 days ago
3
Add ScheduledGarbageCollector callback
#71
warner-benjamin
closed
5 days ago
1
Add cosinvsqrt schedule to main
#70
warner-benjamin
closed
6 days ago
0
Fix eval flexbert
#69
NohTow
closed
1 week ago
3
Fix different_first_layer tests
#68
warner-benjamin
closed
1 week ago
0
Add Parameter Count
#67
warner-benjamin
closed
1 week ago
0
Add CosineInverseSqrtScheduler
#66
warner-benjamin
closed
1 week ago
1
Add EurLex to evaluation suite
#65
rbiswasfc
closed
1 week ago
3
Set initial layers independently from the of rest of the model
#64
warner-benjamin
closed
1 week ago
1
Add Triton RMSNorm kernel and low precision RMSNorm
#63
warner-benjamin
closed
2 weeks ago
1
Add support for TritonRMS Norm
#62
ohallstrom
closed
2 weeks ago
0
Add OLMo Model Initialization Options
#61
warner-benjamin
closed
2 weeks ago
1
Fix GLUE tests
#60
jackcook
closed
2 weeks ago
3
Default pretraining gradient clipping & bias/norm weight decay filtering
#59
warner-benjamin
closed
2 weeks ago
2
Enable Mosiac Dataset Streaming
#58
orionw
closed
3 weeks ago
0
Add weight decay filtering and StableAdamW
#57
warner-benjamin
closed
3 weeks ago
0
Support final_norm in tests
#56
warner-benjamin
closed
3 weeks ago
2
Rotary Embedding for unpadded sequences
#55
staghado
closed
3 weeks ago
8
[WIP] Fixing some issues with the use of act/norm/pooling config parameters and consistency with MosaicBERT
#54
NohTow
closed
3 weeks ago
2
Also get length of the data statistics
#53
orionw
closed
3 weeks ago
0
Model : doing a Albert24 ?
#52
sileod
closed
4 weeks ago
1
[DATA] Tasksource fine-tuning
#51
sileod
closed
4 weeks ago
1
Correct norm placement when using pre layer norm attention
#50
ohallstrom
closed
4 weeks ago
3
Fix betas issue and add adamw
#49
staghado
closed
1 month ago
1
Allow resizing the model embeddings layer to fit the tokenizer
#48
NohTow
closed
1 month ago
0
Data: Replace CommonCrawl Dolma Subsets with (upcoming) Fineweb-*
#47
bclavie
opened
1 month ago
1
Training (Comparison): Init off PileT5 weights
#46
bclavie
opened
1 month ago
0
DATA: Synthetic Augmentations (phi-like)
#45
bclavie
opened
1 month ago
0
DATA/Training: Rho-like token selection
#44
bclavie
opened
1 month ago
0
EVAL: Long-context evals
#43
bclavie
opened
1 month ago
1
EVAL: Legal Entailment
#42
bclavie
opened
1 month ago
0
Setting intermediate size back to its original definition
#41
NohTow
closed
1 month ago
1
Add flag for applying attn_mask to SDPA
#40
warner-benjamin
closed
1 month ago
3
Adds code to generate data mixtures with specified proportions.
#39
griff4692
closed
1 month ago
0
`prefetch_factor` in `convert_dataset.py` should be `None` when not using `num_workers` is not `> 0`
#38
ReinforcedKnowledge
closed
1 month ago
0
Migrate ColBERT to conda environment to leverage faiss-gpu
#37
bclavie
opened
1 month ago
1
Add RoPE with FlexBert blocks
#36
staghado
closed
1 month ago
2
Attention fixes
#35
warner-benjamin
closed
1 month ago
1
Superglue part 1
#34
iacolippo
closed
3 weeks ago
14
Parallel attention with flexbert modules
#33
NohTow
closed
1 month ago
2
Add FlexBERT, a modular and hackable BERT implementation
#32
warner-benjamin
closed
1 month ago
4
Add warmup stable decay lr schedule
#31
ohallstrom
closed
1 month ago
1
Next