-
The option to change screenshot format appears for me when saving images but disappears when copying to clipboard.
-
The NCCL timed out while using the zero3 model. How can I solve this problem?
I inherited the large model Mixtral 7BX8 and utilized the Llama architecture, augmenting it with multi-modal capa…
-
This program demonstrates a possible bug in the tester library with
array lists constructed using constructor blocks. The first test
does not use constructor blocks and works as expected. The second…
-
Hi,
First off thanks for all the work you guys have put into this.
I am trying to run DeepSeek-Coder-V2-Instruct-0724-GGUF Q4_K_M with reasonable performance but cannot figure it out. When i use…
-
When using the HTML files provided by Mokuro (not to be confused with the separate Mokuro *Reader* app) and enabling Anki integration in the advanced settings, I can successfully create new cards by p…
-
**Describe the regression**
In the forks of Megatron-LM used by gpt-neox and megatron-deepspeed, MoEs are obtaining lower loss than they are in Megatron-LM with the same configuration.
**To Reprod…
-
# Per-Parameter-Sharding FSDP
## Motivation
As we looked toward next-generation training, we found limitations in our existing FSDP, mainly from the _flat parameter_ construct. To address these, w…
-
# Progress
- [x] Implement TPU executor that works on a single TPU chip (without tensor parallelism) #5292
- [x] Support single-host tensor parallel inference #5871
- [x] Support multi-host ten…
-
ZN-M2 8bb4873 5 days ago
编译成功,刷入后一直重启。无法正常启动。
-
### Your current environment
```Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.…