-
**Describe the bug**
When using ZeRO optimizer training MoE model, the gradient of the expert weights is **ep_size times larger than** the true gradient.
**Related issue & pr**
Issue [#5618] ha…
-
### 🐛 Describe the bug
## Installation
Install PyTorch
```
pip install torch -U --index-url https://download.pytorch.org/whl/nightly/cu121
```
Install TorchRec
https://github.com/pytorch/to…
-
Recent mistral models inlcuding mistral 7b v0.3 instruct have consolidated.safetensors which have different weights key names compared to what vllm expects. Also there are keys like layernorm and po…
-
### 🐛 Describe the bug
Hello, I am working on a project where I need to use multiple consecutive instances of DistributedDataParallel (DDP) within the same torch.distributed environment. In my scen…
-
Hi, I was wondering if you will release the weights for the model. I want to use them for my research to observe their performance on my dataset before trying few-shot methods and training other model…
-
I am getting
```
Traceback (most recent call last):
File "predict.py", line 219, in
predictor.setup(model_base=None, model_name="nextgpt-v1.5-7b", model_path="./checkpoints/nextgpt-v1.5-7…
-
### Search before asking
- [X] I have searched the Inference [issues](https://github.com/roboflow/inference/issues) and found no similar feature requests.
### Description
from inference import get…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
[WARNING|logging.py:328] 2024-10-30 18:47:58,798 >> `Qwen2VLRotaryEmbedding` can now be fully parameter…
-
## Description
I ran this command:
```
spineps sample -ignore_bids_filter -ignore_inference_compatibility -i raw/sub-247876_acq-sagittal_T2w.nii.gz -model_semantic t2w_segmentor_2.0 -model_instan…
-
### System Info
## System Specifications
2024-11-10T21:20:44.880890Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: 97f7…