-
When compiling pytorch, which vendors flash-attention 2.5.0, the memory requirements are enormous: 5.3GB per process or so.
I noticed the intermediate files generated by nvcc are ridiculously large…
-
tried to build yolo3 model with the instructions here in this repo. but when i change my make file for GPU configuration as follows and set all the paths in the makefile.
GPU=1
CUDNN=1
CUDNN_HAL…
-
Hi @hotfinda ,
Could you please share the actual implementation of the paper that can re-produce the results you reported in the paper?
Basically, the current code is not running.
(1) For the…
-
### Your current environment
```text
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC ve…
-
Here is the stacktrace of `run_pretrain_bart.sh` error:
```
[rank0]: IndexError: Caught IndexError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: F…
-
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It shoul…
-
I want to use te's comm-gemm-overlap module to perform multi-node training, however the readme says this module only support single node. Does te have any plan for multi nodes support? And what effort…
-
程序跑完1个epoch之后,在第二轮训练过程中卡住,超时报错了
请问这个问题大概出现在哪里?
[2024-05-09 01:12:34 accelerate.tracking]: Successfully logged to TensorBoard
[rank3]:[E ProcessGroupNCCL.cpp:523] [Rank 3] Watchdog caught collective…
-
I found that there is constraint on the dimensionality when we use the transformer cuda kernel: https://github.com/microsoft/DeepSpeed/blob/d720fdb6857f4b71d922ca1e8efbe5271b5fb7c2/csrc/transformer/no…
-
```
ubuntu@tegra-ubuntu:~/obj_recog/darknet$ make
make: Warning: File `obj' has modification time 5.7e+08 s in the future
nvcc -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_…