-
## 🚀 Feature
`CombinedStreamingDataset` allows you to combine multiple `StreamingDataset`s with a sampling ratio -- but it assumes that that the `batch_size` is the same for each dataset.
###…
-
Hi, I have following setup:
- Transformer model with N layers scanned over input
- fully sharded data parallel sharding
- asynchronous communications (latency-hiding scheduler, pipelined all-gather…
-
**Describe the bug**
```json
{
"name": "Python: debug_cl",
"type": "debugpy",
"request": "launch",
"program": "swift/cli/main.py",
…
-
I am sorry if I missed any existing functionality or documentation on this topic but I could not find anything.
**Is your feature request related to a problem? Please describe.**
SupervisedTrain…
-
Hi,
I'm experiencing an issue with `clip_grad_norm_` and loss values while training Mamba2. After training for some time, the gradient norm starts to rapidly increase to infinity. If training continu…
edwko updated
1 month ago
-
WARNING gradient_accumulation_steps is 3. accelerate does not support train_db.py:109
gradient_accumulation_steps when training multiple models (U-Net a…
-
DeepSpeed has support for several dtypes now (i.e., fp32, fp16, bf16). However, it's becoming less clear what parts of training are using what dtypes and what time. For example, in #1801 we added supp…
-
Hello,
I'd like to understand the effect of the gradient_accumulation_every parameter.
From reviewing the piece of code below, it appears that not all batches are utilized for the training.
F…
-
We are trying to use a LongFormer and Bert model for multi-label classification of different documents.
When we use the BERT model (BertForSequenceClassification) with max length 512 (batch size 8…
-
05장 예시 실행시 numpy version latest(2.x version)로 실행하면, 다음과 같은 에러 발생합니다.
---
---------------------------------------------------------------------------
ValueError Trace…