-
Hello,
The training outputs the model flops utilization (MFU), which is quite low on my card (like 7-8%).
Does anyone know what score is to be expected? I don't have an A100 readily available to t…
-
### Issue Content:
**Description:**
It usually happens during lora training for some time.
I encountered a `subprocess.CalledProcessError` when running the `train_network.py` script using the …
-
Using the default FeatureExtractor settings for the HuggingFace port of YOLOS, I am consistently running into CUDA OOM errors on a 16GB V100 (even with a training batch size of 1).
I would like to …
-
QGIS version: 3.22.5-Białowieża
QGIS code revision: c2723178
Qt version: 5.15.2
Python version: 3.9.5
GDAL version: 3.4.1
GEOS version: 3.10.2-CAPI-1.16.0
PROJ version: Rel. 8.2.1, January 1st, …
-
steps to reproduce
1) start a runpod container with the pytorch 2.01 template and lots of disk space
2) run your sample command on a properly formatted dataset:
python -m llamatune.train \
--m…
-
我使用的环境是
pytorch 1.4.0
transformers 2.8.0
参照着文档https://github.com/thunlp/OpenMatch/blob/master/docs/experiments-msmarco.md 中的训练命令
```
CUDA_VISIBLE_DEVICES=0 \
python train.py\
-task r…
-
Hi team,
I was fine tuning an LLM with Ludwig on a **NVIDIA A 100** instance.
I get the error message - **Encounted `nan` values in tensor. Will be removed.", UserWarning)** My loss and perplexi…
-
``` 07/31/2023 19:11:36 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
{'prediction_type', '…
-
Hi,
I want to reproduce your results via your provided codes. But I was stuck in the fine-tuning section. No matter how I reduce the batch size and input image size, it still says cuda out of memory.…
-
### System Info
```Shell
accelerate 0.20.3
python 3.10
numpy 1.24.3
torch 2.0.1
accelerate config:
compute_environment: LOCAL_MACHINE
deepspeed_config:
deepspeed_multinode_launcher: stand…