moe Search Results - Githubissues

1000+ results
for moe

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2272

Wrong output when input is packed in Whisper with C++ runtim…

### System Info CPU Architecture: x86_64 GPU: NVIDIA A100-SXM4-40GB TensorRT-LLM version: 0.14.0.dev2024091700 ### Who can help? _No response_ ### Information - [x] The official example scri…

sasikr2 updated 1 month ago
3
pytorch/xla #7049

Spmd whether expert parallelism is supported？

torchxla spmd whether expert parallelism is supported？ If it is a moe model, how should it be computed in xla？ ## ❓ Questions and Help

mars1248 updated 6 months ago
3
YeonwooSung/Pytorch_mixture-of-experts #2

Do training and inference of MoE share the same dispatching …

While MoE training typically uses a fixed capacity to distribute tokens evenly across all experts, my understanding is that inference involves activating experts based on predicted relevance via a sof…

marsggbo updated 10 months ago
1
tarihub/tarihub.github.io #131

云安全靶场 AWS free 篇 | pwnedlabs.io | tari Blog

https://tari.moe/2024/pwnedlabs-aws-free 这靶场挺不错，有官方配套 WP，然后除了攻还有防，也很适合入门和进阶～每个靶场有相应的场景和 RealWorld 描述，也把一些诸如 BlackHat 的议题做成靶场，个人认为整体质量很高

tarihub updated 1 month ago
6
Aodaruma/coa_tools2 #52

Sprites in slots unparent from armature when converting to c…

**Describe the bug** When converting from old coa tools to the new version , Sprites that were merged into a slot objects become unparented from the armature. Object also can't be edited, showing a…

HissatsuNeko updated 11 months ago
2
OpenBMB/vllm #14

[Installation]: 安装报告找不到numpy，实际nump已经安装好了

### Your current environment PyTorch version: 2.4.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (U…

qq745639151 updated 2 weeks ago
1
microsoft/DeepSpeed #6545

[BUG] Expert gradient scaling problem with ZeRO optimizer

**Describe the bug** When using ZeRO optimizer training MoE model, the gradient of the expert weights is **ep_size times larger than** the true gradient. **Related issue & pr** Issue [#5618] ha…

wyooyw updated 2 months ago
1
irthomasthomas/undecidability #647

Qwen-1.5-8x7B : r/LocalLLaMA

- [ ] [Qwen-1.5-8x7B : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1atw4ud/qwen158x7b/) # TITLE: Qwen-1.5-8x7B : r/LocalLLaMA **DESCRIPTION:** "Qwen-1.5-8x7B New Model Someone creat…

irthomasthomas updated 8 months ago
1
arcee-ai/DistillKit #14

Is Dense to MoE, MoE to Dense or MoE to MoE Distillation sup…

And If so, under what circumstances using which method? (logit or hidden state distillation) Im assuming MoE to Dense and Dense to MoE won't work with logit based distillation, but I'm not sure abo…

linux-leo updated 2 weeks ago
1
StartHua/Comfyui_CXH_joy_caption #89

Joy_caption_alpha_run 加载错误

像是缺失了文件 Unrecognized model in D:\LIUGEGE\ComfyUI\models\Joy_caption_alpha\text_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, a…

LiuGe126 updated 1 month ago
2

上一页 1...73 74 75 76 77 78 79...100 下一页

1000+ results for moe

1000+ results
for moe