-
### System Info
GPU: `A10`
Base Image: `FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04`
Tensorrt-llm:
- `0.12.0` : It's working, but I can't use it because of a version mismatch in TRT and trt-llm-back…
-
https://freefilesync.org/manual.php?topic=schedule-batch-jobs
sftp://xrlab.z.science.ru.nl:12009
-
![image](https://github.com/user-attachments/assets/c4a7fdd9-9fa3-4ae4-af9a-5ea76f770b39)
I only got 71.4127% in the Tiny model.
Here's my command:
```
python3 train.py --dataset-path='…
-
### 🐛 Describe the bug
Test Environment:
- Hardware: A100 80GB GPU
- Model: Llama3-8b
**- Parameters: temperature = 0, max_tokens = 1024, max_num_seqs = 256, seed=1**
- I make OpenAI-Compatilb…
-
I have built OpenMPI 3.0.1 with GCC 4.8.5 on our system. In order to be batch system aware, OpenMPI is linked against some batch system libraries:
```
[spackapps@eu-develop-01 lib]$ ldd libopen-rt…
-
Hi. Thanks for your insightful work. Could you share us the hyperparams of training SD1.5 on Laion dataset?
- Which Laion dataset did you use? laion2b or aesthetic or something else?
- How many GP…
-
Due to https://github.com/resque/resque-scheduler/pull/767, resque-scheduler 4.9+ now wraps all enqueues within a Redis transaction. This means that Redis commands in `before_enqueue` hooks (inadverte…
-
This the script I used for fine tuning.
```
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
export PDSH_RCMD_TYPE=ssh
# NCCL setting
export GLOO_SOCKET_IFNAME=bond0
export NCCL_SO…
-
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[13], [line 3](vscode-notebook-cell:?ex…
-
We should extract common logic from kube-batch/volcano, make sure user pass some client by their own based on their choice.