RVC-Project / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
MIT License
20.61k stars 3.18k forks source link

AMD-rocm-linux problem with VRAM #2111

Open AMD7900XTX opened 2 weeks ago

AMD7900XTX commented 2 weeks ago

my device: 截图 2024-06-06 21-11-14 截图 2024-06-06 21-08-56 截图 2024-06-07 09-11-12

At the begining,i try ubtunu 20.04/22.04 with rocm 5.7/6.0/6.0.2/6.1.1/6.1.2 ,everytime ,the time loading epoch 1 take long time, and there will definitely be a crash and followed by black screen, i also try batch size = 1//4/6/8/10/12 and data set =3h/1h/20min ,no help .Yesterday,rocm 6.1.2 realease,however,nothing has changed. and i launch with : export CUDA_VISIBLE_DEVICES=0
export CUDA_CACHE_MAXSIZE=4294967296 export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_ROCM_ARCH="gfx1100"

so, i wonder whats the problem ,my env or program(rocm、pytorch、tensorflow-rocm etc.) or my GPU ? because it crash without any error code,so i cannot provide more info

AMD7900XTX commented 2 weeks ago

i find the reason(maybe),the amdgpu driver`s default setting :core clock 0~3200MHz, and Memory clock 0~1249MHz, when i train, them will achieve max, and you kown, which meanings overclocking, so it broken. Why AMD set up like this?