issues
search
databricks
/
megablocks
Apache License 2.0
1.11k
stars
154
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Routing
#118
alexliap
opened
1 day ago
1
Illegal memory access on non-0 cuda devices from `histogram`
#117
phillip-kravtsov
opened
3 days ago
0
bump
#116
vchiley
opened
1 month ago
0
Cloning input `x` in `megablocks.layers.glu.SparseGLU` leads to different SDD outputs
#115
cmsflash
closed
2 weeks ago
2
Can we change self.blocking in dmoe.py from 128 to 64?
#114
seanM29
opened
1 month ago
2
_LOAD_BALANCING_LOSS returns empty list sometimes
#113
exnx
opened
1 month ago
1
bump and pin versions
#112
vchiley
closed
1 month ago
0
Fix AMP for memory optimized options
#111
mvpatel2000
closed
1 month ago
0
Bad throughput with GLU
#110
Muennighoff
opened
1 month ago
1
Add Shared Expert
#109
vchiley
closed
1 month ago
0
Fix for `ffn_hidden_size` of 128, and better error message for incompatible ffn sizes.
#108
snarayan21
closed
1 month ago
0
1-expert worse than dense model
#107
Muennighoff
opened
1 month ago
0
OSError: Stale file handle with dMoE
#106
Muennighoff
opened
2 months ago
3
Add a fine-tune script for JetMoE
#105
shamanez
opened
2 months ago
2
ScatterMoE feature
#104
ehartford
opened
2 months ago
5
Implement Mixture of Depth and Experts (MoDE)
#103
casper-hansen
opened
2 months ago
2
Sum missing axis arg in kernels.py
#102
jambo6
closed
2 months ago
4
Import dmoe model into other training script?
#101
andrewnc
opened
2 months ago
3
Computation distribution with expert parallelism
#100
opherlieber
closed
2 months ago
1
SFT Script and Hyperparameters used for DBRX-Instruct
#99
alpayariyak
opened
3 months ago
5
Update README.md
#98
dakinggg
closed
3 months ago
0
support amd/rocm
#97
ehartford
opened
3 months ago
3
Remove turbo
#96
dblalock
closed
4 months ago
0
AMP + BF16 failing
#95
jramapuram
opened
5 months ago
2
Unsharding scripts for megablocks models
#94
mayank31398
opened
5 months ago
0
the wrong loss func was chosen at evaluation
#93
peterjc123
opened
5 months ago
2
Seeking a good multi-node training config
#92
rpand002
opened
5 months ago
3
selective router precision
#91
152334H
opened
5 months ago
1
Does this framework support SFT?
#90
banksy23
opened
5 months ago
2
Updt triton pin
#89
vchiley
closed
5 months ago
1
RuntimeError: Triton Error [CUDA]: invalid argument
#88
noob-ctrl
opened
5 months ago
15
Fix `moe_normalize_expert_weights` when `top_k=1`
#87
152334H
closed
5 months ago
3
Gradient scale size for expert gradient
#86
fanshiqing
closed
5 months ago
4
different load_balancing_loss with different pipeline_parallel_size
#85
bozheng-hit
opened
5 months ago
8
How to integrate to transformers-based mixtral
#84
nxphi47
opened
5 months ago
1
ParallelDroplessMLP initialises self.mlp twice
#83
152334H
opened
5 months ago
6
save loading_balancing_loss properly
#82
gouchangjiang
closed
5 months ago
2
Why the second matrix of the mlp layer has the same shape of the first one?
#81
gouchangjiang
opened
6 months ago
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights
#80
RookieHong
opened
6 months ago
1
fix the abnormal ‘CAPACITY_FACTOR’ value
#79
jordgedu
opened
6 months ago
3
Error from pip about missing torch module
#78
michaelwhitford
closed
5 months ago
4
Efficiency of torch mlp
#77
imoneoi
closed
6 months ago
2
Fix default to be sparse
#76
mvpatel2000
closed
6 months ago
0
Add dmlp registry args
#75
j316chuck
closed
6 months ago
0
Refactor dtesnor
#74
mvpatel2000
closed
6 months ago
0
Dtensor to all paths
#73
mvpatel2000
closed
6 months ago
0
Mem opt glu bkwd
#72
mvpatel2000
closed
6 months ago
0
Add cast to tensor for DTensor inputs for groupedmlp
#71
eracah
closed
6 months ago
0
Change router weight norm from in-place
#70
sashaDoubov
closed
6 months ago
0
Skip updating load balancing loss on eval
#69
sedrick-keh-tri
closed
6 months ago
2
Next