issues
search
databricks
/
megablocks
Apache License 2.0
1.17k
stars
169
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Sum missing axis arg in kernels.py
#102
jambo6
closed
5 months ago
4
Import dmoe model into other training script?
#101
andrewnc
opened
6 months ago
3
Computation distribution with expert parallelism
#100
opherlieber
closed
6 months ago
1
SFT Script and Hyperparameters used for DBRX-Instruct
#99
alpayariyak
opened
6 months ago
5
Update README.md
#98
dakinggg
closed
6 months ago
0
support amd/rocm
#97
ehartford
opened
6 months ago
3
Remove turbo
#96
dblalock
closed
7 months ago
0
AMP + BF16 failing
#95
jramapuram
opened
8 months ago
4
Unsharding scripts for megablocks models
#94
mayank31398
opened
8 months ago
0
the wrong loss func was chosen at evaluation
#93
peterjc123
opened
8 months ago
2
Seeking a good multi-node training config
#92
rpand002
opened
8 months ago
3
selective router precision
#91
152334H
opened
8 months ago
1
Does this framework support SFT?
#90
banksy23
opened
8 months ago
2
Updt triton pin
#89
vchiley
closed
8 months ago
1
RuntimeError: Triton Error [CUDA]: invalid argument
#88
noob-ctrl
opened
8 months ago
15
Fix `moe_normalize_expert_weights` when `top_k=1`
#87
152334H
closed
8 months ago
3
Gradient scale size for expert gradient
#86
fanshiqing
closed
8 months ago
4
different load_balancing_loss with different pipeline_parallel_size
#85
bozheng-hit
opened
8 months ago
8
How to integrate to transformers-based mixtral
#84
nxphi47
opened
9 months ago
1
ParallelDroplessMLP initialises self.mlp twice
#83
152334H
opened
9 months ago
6
save loading_balancing_loss properly
#82
gouchangjiang
closed
8 months ago
2
Why the second matrix of the mlp layer has the same shape of the first one?
#81
gouchangjiang
opened
9 months ago
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights
#80
RookieHong
opened
9 months ago
1
fix the abnormal ‘CAPACITY_FACTOR’ value
#79
jordgedu
opened
9 months ago
3
Error from pip about missing torch module
#78
michaelwhitford
closed
8 months ago
4
Efficiency of torch mlp
#77
imoneoi
closed
9 months ago
2
Fix default to be sparse
#76
mvpatel2000
closed
9 months ago
0
Add dmlp registry args
#75
j316chuck
closed
9 months ago
0
Refactor dtesnor
#74
mvpatel2000
closed
9 months ago
0
Dtensor to all paths
#73
mvpatel2000
closed
9 months ago
0
Mem opt glu bkwd
#72
mvpatel2000
closed
9 months ago
0
Add cast to tensor for DTensor inputs for groupedmlp
#71
eracah
closed
9 months ago
0
Change router weight norm from in-place
#70
sashaDoubov
closed
9 months ago
0
Skip updating load balancing loss on eval
#69
sedrick-keh-tri
closed
9 months ago
2
Script for Full Fine-Tuning of Mixtral
#68
alpayariyak
opened
9 months ago
1
Docker issues with PyPI installation
#67
sedrick-keh-tri
opened
9 months ago
3
add mem optimized grouped glu
#66
vchiley
closed
9 months ago
0
enable custom activation functions
#65
vchiley
closed
9 months ago
4
How do you use routing balancing loss under pipeline parallelism
#64
szhengac
closed
9 months ago
5
Update README.md
#63
eltociear
closed
9 months ago
1
Has anyone encountered this CUDA error?
#62
bozheng-hit
closed
9 months ago
15
Question on offsets in figures 5
#61
DaehanKim
closed
9 months ago
2
More customizable norm for expert weights
#60
snarayan21
closed
9 months ago
0
About the Multi-node Script
#59
XingyuXie
closed
9 months ago
4
enable arg enabled normalization of routing weights
#58
vchiley
closed
9 months ago
0
[integrating megablocks with open_lm] Question about megablocks + FSDP
#57
kernelmachine
closed
9 months ago
9
Update setup.py to support multiple device capabilities
#56
simon-mo
closed
9 months ago
6
Update Megatron-LM scripts and integration for latest Docker container.
#55
tgale96
closed
9 months ago
0
Remove errant "*" in README
#54
tgale96
closed
9 months ago
0
Fix * in README
#53
tgale96
closed
9 months ago
0
Previous
Next