ppl-moe Search Results - Githubissues

DataUSA/datausa-site #544

show all indicators that are representative of population as…

> note from @davelandry: I'll be updating this list as we find more variables - [x] [poverty](https://datausa.io/map/?level=county&key=income_below_poverty,income_below_poverty_moe,pop_poverty_status…

danielbyler updated 7 years ago

databricks/megablocks #134

Running into ValueError when running moe/dmoe scripts

rtmadduri updated 2 months ago

EleutherAI/gpt-neox #1248

batch_input and elapsed time per iteration suddenly slow dow…

# Batch_input and elapsed time per iteration slow down during model training ![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476) …

Yuhanleeee updated 1 month ago

NVIDIA/Megatron-LM #897

Batch_input and elapsed time per iteration slow down during …

Yuhanleeee updated 1 month ago

facebookresearch/fairseq #4788

Unable to run fairseq_cli.eval_lm script for MoE with a shar…

## 🐛 Bug I trained a MoE model on multiple nodes and am now attempting to calculate eval losses on a new dataset using this model. I am using the [example script from the MoE branch](https://github…

rohitdwivedula updated 1 year ago

arcee-ai/mergekit #294

Idea: Scaling the Down-Projection Matrix in 'Mixture of Expe…

## Problem In a Mixture of Experts (MoE) LLM, the gating network outputs a categorical distribution of $n$ values (chosen from $n_{max}$), which is then used to create a convex combination of the $n$…

jukofyork updated 6 months ago

NVIDIA/Megatron-LM #934

[BUG]Get an AtrributeError when trying to convert llama3-8B …

Describe the bug Get an AtrributeError when trying to convert llama3-8B model from HF format to mcore format, the error is below: `AttributeError: 'Tokenizer' object has no attribute 'vocab_size'`…

nakroy updated 1 month ago

NVIDIA/Megatron-LM #1032

[BUG] When the model has extra layers, initializing the mode…

**Describe the bug** I'm trying to use the Llama2 model saved with `--use-dist-ckpt` after SFT (Supervised Fine-Tuning) to train a reward model. The reward model does not require the original checkpo…

haolin-nju updated 4 days ago

microsoft/Tutel #182

My code seems to hang when skip_remainder_batch=False.

**Describe the bug** Hi, Authors. My code seems to hang when skip_remainder_batch=False. **To Reproduce** Steps to reproduce the behavior: ``` git clone https://github.com/microsoft/tutel --b…

Fragile-azalea updated 2 years ago

microsoft/Megatron-DeepSpeed #151

why does MoE not support TP and PP

Hi, I was trying MoE with [MoE Example](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/MoE) in Megatron-DeepSpeed(microsoft) and saw these comments: ``` ## Model parallelism…

cccc0der updated 8 months ago

93 results for ppl-moe

93 results
for ppl-moe