databricks megablocks issues

databricks / megablocks

Apache License 2.0

1.11k stars 154 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Routing

#118 alexliap opened 1 day ago
1
Illegal memory access on non-0 cuda devices from `histogram`

#117 phillip-kravtsov opened 3 days ago
0
bump

#116 vchiley opened 1 month ago
0
Cloning input `x` in `megablocks.layers.glu.SparseGLU` leads to different SDD outputs

#115 cmsflash closed 2 weeks ago
2
Can we change self.blocking in dmoe.py from 128 to 64?

#114 seanM29 opened 1 month ago
2
_LOAD_BALANCING_LOSS returns empty list sometimes

#113 exnx opened 1 month ago
1
bump and pin versions

#112 vchiley closed 1 month ago
0
Fix AMP for memory optimized options

#111 mvpatel2000 closed 1 month ago
0
Bad throughput with GLU

#110 Muennighoff opened 1 month ago
1
Add Shared Expert

#109 vchiley closed 1 month ago
0
Fix for `ffn_hidden_size` of 128, and better error message for incompatible ffn sizes.

#108 snarayan21 closed 1 month ago
0
1-expert worse than dense model

#107 Muennighoff opened 1 month ago
0
OSError: Stale file handle with dMoE

#106 Muennighoff opened 2 months ago
3
Add a fine-tune script for JetMoE

#105 shamanez opened 2 months ago
2
ScatterMoE feature

#104 ehartford opened 2 months ago
5
Implement Mixture of Depth and Experts (MoDE)

#103 casper-hansen opened 2 months ago
2
Sum missing axis arg in kernels.py

#102 jambo6 closed 2 months ago
4
Import dmoe model into other training script?

#101 andrewnc opened 2 months ago
3
Computation distribution with expert parallelism

#100 opherlieber closed 2 months ago
1
SFT Script and Hyperparameters used for DBRX-Instruct

#99 alpayariyak opened 3 months ago
5
Update README.md

#98 dakinggg closed 3 months ago
0
support amd/rocm

#97 ehartford opened 3 months ago
3
Remove turbo

#96 dblalock closed 4 months ago
0
AMP + BF16 failing

#95 jramapuram opened 5 months ago
2
Unsharding scripts for megablocks models

#94 mayank31398 opened 5 months ago
0
the wrong loss func was chosen at evaluation

#93 peterjc123 opened 5 months ago
2
Seeking a good multi-node training config

#92 rpand002 opened 5 months ago
3
selective router precision

#91 152334H opened 5 months ago
1
Does this framework support SFT?

#90 banksy23 opened 5 months ago
2
Updt triton pin

#89 vchiley closed 5 months ago
1
RuntimeError: Triton Error [CUDA]: invalid argument

#88 noob-ctrl opened 5 months ago
15
Fix `moe_normalize_expert_weights` when `top_k=1`

#87 152334H closed 5 months ago
3
Gradient scale size for expert gradient

#86 fanshiqing closed 5 months ago
4
different load_balancing_loss with different pipeline_parallel_size

#85 bozheng-hit opened 5 months ago
8
How to integrate to transformers-based mixtral

#84 nxphi47 opened 5 months ago
1
ParallelDroplessMLP initialises self.mlp twice

#83 152334H opened 5 months ago
6
save loading_balancing_loss properly

#82 gouchangjiang closed 5 months ago
2
Why the second matrix of the mlp layer has the same shape of the first one?

#81 gouchangjiang opened 6 months ago
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights

#80 RookieHong opened 6 months ago
1
fix the abnormal ‘CAPACITY_FACTOR’ value

#79 jordgedu opened 6 months ago
3
Error from pip about missing torch module

#78 michaelwhitford closed 5 months ago
4
Efficiency of torch mlp

#77 imoneoi closed 6 months ago
2
Fix default to be sparse

#76 mvpatel2000 closed 6 months ago
0
Add dmlp registry args

#75 j316chuck closed 6 months ago
0
Refactor dtesnor

#74 mvpatel2000 closed 6 months ago
0
Dtensor to all paths

#73 mvpatel2000 closed 6 months ago
0
Mem opt glu bkwd

#72 mvpatel2000 closed 6 months ago
0
Add cast to tensor for DTensor inputs for groupedmlp

#71 eracah closed 6 months ago
0
Change router weight norm from in-place

#70 sashaDoubov closed 6 months ago
0
Skip updating load balancing loss on eval

#69 sedrick-keh-tri closed 6 months ago
2