databricks megablocks issues

databricks / megablocks

Apache License 2.0

1.17k stars 169 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Sum missing axis arg in kernels.py

#102 jambo6 closed 5 months ago
4
Import dmoe model into other training script?

#101 andrewnc opened 6 months ago
3
Computation distribution with expert parallelism

#100 opherlieber closed 6 months ago
1
SFT Script and Hyperparameters used for DBRX-Instruct

#99 alpayariyak opened 6 months ago
5
Update README.md

#98 dakinggg closed 6 months ago
0
support amd/rocm

#97 ehartford opened 6 months ago
3
Remove turbo

#96 dblalock closed 7 months ago
0
AMP + BF16 failing

#95 jramapuram opened 8 months ago
4
Unsharding scripts for megablocks models

#94 mayank31398 opened 8 months ago
0
the wrong loss func was chosen at evaluation

#93 peterjc123 opened 8 months ago
2
Seeking a good multi-node training config

#92 rpand002 opened 8 months ago
3
selective router precision

#91 152334H opened 8 months ago
1
Does this framework support SFT?

#90 banksy23 opened 8 months ago
2
Updt triton pin

#89 vchiley closed 8 months ago
1
RuntimeError: Triton Error [CUDA]: invalid argument

#88 noob-ctrl opened 8 months ago
15
Fix `moe_normalize_expert_weights` when `top_k=1`

#87 152334H closed 8 months ago
3
Gradient scale size for expert gradient

#86 fanshiqing closed 8 months ago
4
different load_balancing_loss with different pipeline_parallel_size

#85 bozheng-hit opened 8 months ago
8
How to integrate to transformers-based mixtral

#84 nxphi47 opened 9 months ago
1
ParallelDroplessMLP initialises self.mlp twice

#83 152334H opened 9 months ago
6
save loading_balancing_loss properly

#82 gouchangjiang closed 8 months ago
2
Why the second matrix of the mlp layer has the same shape of the first one?

#81 gouchangjiang opened 9 months ago
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights

#80 RookieHong opened 9 months ago
1
fix the abnormal ‘CAPACITY_FACTOR’ value

#79 jordgedu opened 9 months ago
3
Error from pip about missing torch module

#78 michaelwhitford closed 8 months ago
4
Efficiency of torch mlp

#77 imoneoi closed 9 months ago
2
Fix default to be sparse

#76 mvpatel2000 closed 9 months ago
0
Add dmlp registry args

#75 j316chuck closed 9 months ago
0
Refactor dtesnor

#74 mvpatel2000 closed 9 months ago
0
Dtensor to all paths

#73 mvpatel2000 closed 9 months ago
0
Mem opt glu bkwd

#72 mvpatel2000 closed 9 months ago
0
Add cast to tensor for DTensor inputs for groupedmlp

#71 eracah closed 9 months ago
0
Change router weight norm from in-place

#70 sashaDoubov closed 9 months ago
0
Skip updating load balancing loss on eval

#69 sedrick-keh-tri closed 9 months ago
2
Script for Full Fine-Tuning of Mixtral

#68 alpayariyak opened 9 months ago
1
Docker issues with PyPI installation

#67 sedrick-keh-tri opened 9 months ago
3
add mem optimized grouped glu

#66 vchiley closed 9 months ago
0
enable custom activation functions

#65 vchiley closed 9 months ago
4
How do you use routing balancing loss under pipeline parallelism

#64 szhengac closed 9 months ago
5
Update README.md

#63 eltociear closed 9 months ago
1
Has anyone encountered this CUDA error?

#62 bozheng-hit closed 9 months ago
15
Question on offsets in figures 5

#61 DaehanKim closed 9 months ago
2
More customizable norm for expert weights

#60 snarayan21 closed 9 months ago
0
About the Multi-node Script

#59 XingyuXie closed 9 months ago
4
enable arg enabled normalization of routing weights

#58 vchiley closed 9 months ago
0
[integrating megablocks with open_lm] Question about megablocks + FSDP

#57 kernelmachine closed 9 months ago
9
Update setup.py to support multiple device capabilities

#56 simon-mo closed 9 months ago
6
Update Megatron-LM scripts and integration for latest Docker container.

#55 tgale96 closed 9 months ago
0
Remove errant "*" in README

#54 tgale96 closed 9 months ago
0
Fix * in README

#53 tgale96 closed 9 months ago
0

Previous Next