-
I run the training script in a multi-node env: training/step1_supervised_finetuning/training_scripts/multi_node/run_66b.sh
But it seems that the multi-nodes are not launched successfully and a warnin…
-
### 🐛 Describe the bug
When doing multi-GPU processing and using DistributedSampler, the shuffle=True setting results in some files in the dataset not being used on any of the GPU's. I found this whe…
-
```py
#gnanavel.mutharasu@gmail.com
import os
import random
import argparse
import datetime
import numpy as np
import pandas as pd
from skimage import io
from PIL import ImageFile
from log…
mgvel updated
3 years ago
-
Dear Authors,
How long does this method need to train?
-
Hi,
I've tried everything and still the cursor is not displaying with the latest Windows 10 update...
It recognizes the monitor and works well, but I can't see the cursor even tho I've set the a…
-
Hi,
I'm trying to run the new `Sockeye-3` in multi-nodes with multi-gpus and it fails. I opened a [ticket](https://github.com/awslabs/sockeye/issues/1039) with sockeye and their hypothesis is that…
-
code:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
import pyreft
from huggingface_hub import login
login(token="***")
model_n…
-
@jik876 I see batch_size=16 in config, but I want to clarify that batch size was equaled to 16 per GPU, right? And you used 2 V100 for training with this batch size?
-
Hello,
Is there a way to perform inference on a single image/ directory of images using the provided weights?
-
Hi, I faced the following error when trying to run multi-stream models:
using dlc image augmentation pipeline
Error executing job with overrides: []
Traceback (most recent call last):
File "/…