-
> note from @davelandry: I'll be updating this list as we find more variables
- [x] [poverty](https://datausa.io/map/?level=county&key=income_below_poverty,income_below_poverty_moe,pop_poverty_status…
-
iteration 1000/ 20000 | consumed samples: 512000 | elapsed time per iteration (ms): 336.3 | learning rate: 1.495E-04 | global batch size: 512 | load balancing loss: 9.743530E-02 | lm lo…
-
# Batch_input and elapsed time per iteration slow down during model training
![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476)
…
-
# Batch_input and elapsed time per iteration slow down during model training
![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476)
…
-
## 🐛 Bug
I trained a MoE model on multiple nodes and am now attempting to calculate eval losses on a new dataset using this model. I am using the [example script from the MoE branch](https://github…
-
## Problem
In a Mixture of Experts (MoE) LLM, the gating network outputs a categorical distribution of $n$ values (chosen from $n_{max}$), which is then used to create a convex combination of the $n$…
-
Describe the bug
Get an AtrributeError when trying to convert llama3-8B model from HF format to mcore format, the error is below:
`AttributeError: 'Tokenizer' object has no attribute 'vocab_size'`…
-
**Describe the bug**
I'm trying to use the Llama2 model saved with `--use-dist-ckpt` after SFT (Supervised Fine-Tuning) to train a reward model. The reward model does not require the original checkpo…
-
**Describe the bug**
Hi, Authors. My code seems to hang when skip_remainder_batch=False.
**To Reproduce**
Steps to reproduce the behavior:
```
git clone https://github.com/microsoft/tutel --b…
-
Hi,
I was trying MoE with [MoE Example](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/MoE) in Megatron-DeepSpeed(microsoft) and saw these comments:
```
## Model parallelism…