Open AMD7900XTX opened 2 weeks ago
i find the reason(maybe),the amdgpu driver`s default setting :core clock 0~3200MHz, and Memory clock 0~1249MHz, when i train, them will achieve max, and you kown, which meanings overclocking, so it broken. Why AMD set up like this?
my device:
![截图 2024-06-07 09-11-12](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/170177578/d6ecea28-76f5-4fd5-a38d-f9c365e84c8c)
At the begining,i try ubtunu 20.04/22.04 with rocm 5.7/6.0/6.0.2/6.1.1/6.1.2 ,everytime ,the time loading epoch 1 take long time, and there will definitely be a crash and followed by black screen, i also try batch size = 1//4/6/8/10/12 and data set =3h/1h/20min ,no help .Yesterday,rocm 6.1.2 realease,however,nothing has changed. and i launch with : export CUDA_VISIBLE_DEVICES=0
export CUDA_CACHE_MAXSIZE=4294967296 export HIP_VISIBLE_DEVICES=0
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_ROCM_ARCH="gfx1100"
so, i wonder whats the problem ,my env or program(rocm、pytorch、tensorflow-rocm etc.) or my GPU ? because it crash without any error code,so i cannot provide more info