-
### System Info
CPU Architecture: x86_64
GPU: NVIDIA A100-SXM4-40GB
TensorRT-LLM version: 0.14.0.dev2024091700
### Who can help?
_No response_
### Information
- [x] The official example scri…
-
torchxla spmd whether expert parallelism is supported?
If it is a moe model, how should it be computed in xla?
## ❓ Questions and Help
-
While MoE training typically uses a fixed capacity to distribute tokens evenly across all experts, my understanding is that inference involves activating experts based on predicted relevance via a sof…
-
https://tari.moe/2024/pwnedlabs-aws-free
这靶场挺不错,有官方配套 WP,然后除了攻还有防,也很适合入门和进阶~ 每个靶场有相应的场景和 RealWorld 描述,也把一些诸如 BlackHat 的议题做成靶场,个人认为整体质量很高
-
**Describe the bug**
When converting from old coa tools to the new version , Sprites that were merged into a slot objects become unparented from the armature.
Object also can't be edited, showing a…
-
### Your current environment
PyTorch version: 2.4.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC version: (U…
-
**Describe the bug**
When using ZeRO optimizer training MoE model, the gradient of the expert weights is **ep_size times larger than** the true gradient.
**Related issue & pr**
Issue [#5618] ha…
-
- [ ] [Qwen-1.5-8x7B : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1atw4ud/qwen158x7b/)
# TITLE: Qwen-1.5-8x7B : r/LocalLLaMA
**DESCRIPTION:** "Qwen-1.5-8x7B
New Model
Someone creat…
-
And If so, under what circumstances using which method? (logit or hidden state distillation)
Im assuming MoE to Dense and Dense to MoE won't work with logit based distillation, but I'm not sure abo…
-
像是缺失了文件
Unrecognized model in D:\LIUGEGE\ComfyUI\models\Joy_caption_alpha\text_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, a…