-
I am trying to reproduce the results for the cityscapes dataset. I am now at the joint training and in the paper says the crop size was 1396x1396 px (half image + label margin)(batch size =1). Surpris…
-
Thank you for your encouragement all along. When I execute the .sh file directly in my virtual environment, the code trains normally. However, when I debug it in VS Code using the JSON file below, I e…
-
### 🐛 Describe the bug
Hi, I've changed pytorch's FSDP+TP example to HF-T5 model and run on 3 nodes with 2 GPUs (total 6 GPUs)
### commands
`NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT NCCL_IB_CUDA_SUPP…
-
### 🐛 Describe the bug
On MacOS M-series machines (AND linux machines using 'file_system' sharing) if you use an in-place fill operation on a Linear layer before launching torch multi-processing, t…
-
### Your current environment
```text
The output of `python collect_env.py`
```
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorc…
-
### Your current environment
/usr/lib/python3.10/inspect.py:288: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
return isinstance(objec…
-
### Description of the bug | 错误描述
在处理一个PDF页数为379页的文件时,模型会不断占用内存,达到一定量后被oom杀死(本地机子上也会不断占用内存,最后出现MemoryError)。
magic-pdf --version 0.10.0
运行时内存占用如下:
![image](https://github.com/user-attachments/ass…
-
When I try to run the code below I get this error at the pretrain function:
**Error**
```
File "C:\Users\fabio\Desktop\wetransfer-08d028\Rope_ex_v1.5\RL_Training\behaviour_cloning.py", line 40, i…
-
### 请提出你的问题
自己尝试llama做predict,遇到问题
python -u -m paddle.distributed.launch \
--gpus "6,7" \
--log_dir "output/$task_name""_log" \
run_pretrain.py \
--model_type "llama" \
--…
-
**Dear Gents,**
**I am trying to fine-tune the VGG16 model in order to classify 5 classes, inspired by the link :** https://gist.github.com/fchollet/7eb39b44eb9e16e59632d25fb3119975.
**When i u…