-
进行单机双卡训练时,不定时间异常中止,进行排查发现训练过程中内存泄漏。
使用代码:
```py
import paddle
import paddle.nn as nn
import paddle.optimizer as opt
import paddle.distributed as dist
import paddlex as pdx
from paddlex impor…
-
**Describe the bug**
@ivirshup reported that [cellxgene_census_builder//test_builder.py](https://github.com/chanzuckerberg/cellxgene-census/blob/main/tools/cellxgene_census_builder/tests/test_builder…
-
Proto:
Creating a new user account
1. Register at cp.strongcompute.ai
1. Register a new RSA key
2. Create a new API key - you’ll need this later when you create your credentials file on the …
-
Could you provide an example/recipe for how to run a TensorFlow application on AWS EMR using TensorFlowOnSpark to achieve scale and distributability?
For example let's say I want to run this on AWS…
-
Modelcomparisons are often necessary. How to do them in a computational efficient way with least amount of assumptions.
Proposals:
- R^2
- Square Error
- likelihood ratios
- some kind of predic…
-
Firstly, thanks for publicly releasing the instruction dataset.
While looking through the [dataset](https://huggingface.co/datasets/victor123/evol_instruct_70k), I've noticed several examples wher…
-
Hi @alirezadir. Firstly, thank you very much for putting this together. Really awesome! In this issue, I would like to take the opportunity of suggesting a few tools/platforms with which I have worked…
-
Hi, i'm using one A100 GPU to train PICK and i've set distributed to false.
[2022-06-08 01:41:58,561 - train - INFO] - One GPU or CPU training mode start...
[2022-06-08 01:41:58,565 - train - INFO…
-
### Describe the issue
Issue:
Getting an error when trying to finetune the LLaVA-v1.6-34b
Command:
```
PASTE THE COMMANDS HERE.
```
#!/bin/bash
deepspeed LLaVA/llava/train/train_mem.py \
…
-
Hi guys,
I have been trying to run the Bing experiment but it seems I can't for now.
```
"datasets": {
--
| "wiki_pretrain_dataset": "/data/bert/bnorick_format/128/wiki_pretrain",
| "bc_pr…