-
Thank you for the detailed report for Llama3.1, which is very inspirational. I read the report and have a doubt about training infrastructure. In chapter 3.3.2 titled _Parallelism for Model Scaling_.…
-
### ❓ The question
do you know the slurm script for configs/official/OLMo-7B.yaml?
looking for multi-node slurm script
-
# 🐞 Bug
The same batch size, different micro batches, the algorithm effects are inconsistent.
I have fixed the random seed.
I set chunks equal to 2 or 4
## Code that reproduces
```python
imp…
-
Hi,
I've been experimenting with GPipe and was wondering if it is possible to run different micro-batches on different GPUS? For example if there are 16 micro-batches, is it possible to run 8 micro…
-
For the model I am training, I am relying on a custom [Sampler](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler), that returns variable batch sizes. My task at hand is translation, …
-
### What is the problem the feature request solves?
Query:
```sql
select ss_sold_date_sk, ss_sold_time_sk, ss_quantity, d_year, d_moy, d_dom
from date_dim join store_sales on d_date_sk = ss_so…
-
### Contact Details
_No response_
### Description
**Description:**
When I select "Add a Sketch," the planes are displayed, but it is impossible to select one. There is no hover effect, and clickin…
-
https://github.com/microsoft/DeepSpeedExamples/blob/957ae3141946daf9a6bc5731e261032a13a82f05/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py#L367
just train one epoch,i got…
-
**Describe the bug**
A user reported a crash with 24.01.01 and SFT (while things work fine with 24.01):
```
File "/opt/NeMo-Aligner/examples/nlp/gpt/train_gpt_sft.py", line 215, in main
in…
-
显示如下
--load_model models/RWKV-5-1B5-one-state-slim-novel-tuned.pth --data_file ./finetune/json2binidx_tool/data/training staff_text_document --ctx_len 1024 --epoch_steps 800 --epoch_count 20 --epoch_…