fireice-uk / xmr-stak

Free Monero RandomX Miner and unified CryptoNight miner
GNU General Public License v3.0
4.05k stars 1.79k forks source link

AMD/CPU XMR mining locks up entire machine right after mining starts #2358

Open exomem opened 5 years ago

exomem commented 5 years ago

I had been mining with a Vega 64 and a Skylake I5 6600K using XMR-Stak 2.5. Super stable no problems I tried to upgrade to 2.10.0 and as soon as I began mining my entire Win10 X64 rig froze up. I would connect with Dwarfpool, with currency set to "cryptonight_r". Pool logs in and within seconds total freeze. Have to hard boot to get back up

I was excited when 2.10.1 came out, but I am getting the exact same behavior as 2.10.0 and it seems noone is having the issue I am having. They are able to run for hours or days before any of crash and even then its not a total freeze of entire OS. Nothing responds

Is there any reasonable expectation, based on my description of the symptoms, that the true root cause is at least known or addressed?

I have been using the origin AMD Beta Blockchain drivers (17.x.x). I have read mutiple accounts of other similarly frustrated people with Vega64 on the original Beta Blockchain drivers and tried to upgrade to the 18.x or 19.x Adrenaline drivers and for some reason had to rebuild thier Win10x64 rig from scratch with fresh Win10. I really don't want to go down that route

I am using the precompiled binaries.

I noticed there were OpenCL fixes in 2.10.1 and 2.10.2, but again they don't seem to address the behavior I have seeing. I would provide logs, but there is nothing to collect because the total OS freeze is so quick after beginning mining.

I hate to switch to a different mining sw, because I love that the same program is handling both amd and cpu. Basically i havent been able to mine since hard fork at all, so I diverted my normally dediated XMR GPU to ETH, but using a Vega64 for ETH instead of XMR is a huge waste consideirng i got the Vega64 specifically for XMR

I am using Soft PowerPlay tables/OverdriveNT with the well known tables used to maximize Vega64 for XMR. Before the HF previous to the most recent I used to pull just over 2000H/s, then after the previous HF to most recent my hash dropped to 1850H/s. Now after this most recent HF my Vega64 can't mine XMR

Spudz76 commented 5 years ago

CN-R is harder on things and has random code paths and self-recompilation

this puts more strain on "well known good" clocking and tuning.

You have to retune everything including overdrive, etc. Or turn it all down a good 12% for starters and see how that goes.

Previous CN kernels did not put as much load (peak loads!) on the GPU, so now you have to detune so that the startup and recompile peaks don't trip your PSU. Or get a real PSU haha.

"used to pull 2000" is true before the CN2v2 fork CN2v2 is literally so much harder it only scores 1850, that is correct also CN-R is CN2v2 plus a little bit harder section (sometimes, randomly) which runs yet again slightly slower, and hashrate will vary as the chosen action changes and it will draw spikes of power during self-recompile (where it regenerates the randomized hard section again, possibly making it harder or easier, for next block round)

Everyone gets hurt about the same, more or less, so just consider that 1810H/s is probably the new 2000H/s and get over the whole addiction to bigger literal numbers. It;s "the same hashrate" as it used to be since everyone was lowered at the forks (your % of netrate is about the same)

rickbb commented 5 years ago

I have the same locking up the PC on AMD cards, not with xmr-stak, but xmrig-amd. I have older HD7870 cards. It does seem that AMD hardware is being more difficult to get mining Monero again.

Spudz76 commented 5 years ago

It may be worth adding a backend init queue so that each one boots up sequentially. Since the CPU and the GPU are all getting CN-R set up (compiling randomized) at startup it spikes your hardware as hard as anything possibly could all at once. Stress test fail finder.

If it fired up CPU, and then 500ms to allow PSU to cope, and then fire up the GPU0, and etc for each if more than one - then there would be less max-spike. Very similar to "staggered spinup" on hard drives (which also nuke PSUs if they all want to spin up and chew 18w*20drives = 360w for 5sec all at the same time on one or maybe two 12v rails). But this is 100-175w depending on CPU and plus 180-240w for whatever GPU which is like firing up 280-415w draw suddenly which is as bad as non-staggered drives.

So I think there is something to be fixed about how CN-R initializes, and maybe also a race lock on the recompiler to help keep the inrush current limited. But the new algo remains tougher and probably requires detuning from previous best (you should have had to turn down a little at CN2V2 also? maybe not on Vega only on RX not sure).

Spudz76 commented 5 years ago

@rickbb I think the staggered init may help with some of your thing too, didn't you have like 7 cards or something + CPU?

EDIT: nope 5 cards and CPU disabled, but still 5*149w at fire up it needs a stagger...

EDIT2: guessed at watts on those it might even suck 210w or something. I think if it made it past the inits phase and test phase without all going at the same time it might survive and mine just fine You can test by setting up 5 copies of miner each set for each GPU and exec them in series (emulate what a stagger-start would do if we add to the code...)?

psychocrypt commented 5 years ago

please do all tests with 2.10.2 All versions before are somehow broken. If you have maybe problems with the power spike (what I do not think) than try to remove one gpu from your config.

exomem commented 5 years ago

psychocrypt - I tested again on 2.10.2. No matter what I do the entire system locks up within moments of the pool login

I am using the precompiled 2.10.2 extracted into a fresh directory. I let it recreate all the config files fresh including the amd.txt specified on the cmd line

xmr-stak.exe --noNVIDIA --amd amd.txt

I rolled back the OverDriveNT settings to stock Vega 64. But that doesn't seem to matter. Same total system lockup moments after starting

Is anyone actually mining longterm stable with a single Vega64 and CPU on 2.10.2? My vega64 is AMD Radeon blower style.