Closed mark9064 closed 2 years ago
Downgrade to v17.40 or older. I don't know what AMD have broken in their drivers again.
Sure will downgrade soon. Closing for now.
Didn't wanna reopen this, but using 17.40 drivers and no luck either I reinstalled AMD-APP-SDK and recompiled nsgminer after the new drivers
It works for me with v17.40. Maybe use an older SDK like v2.9.
Sure will do. But i have noticed that I can't find the AMD APP SDK anywhere. AMD's website gives me expired certificate errors and then 404s on the APP SDK page. I wonder what’s up. If you have the installer for the 2.9 version you know works that would be great
thanks dude, trying it now
uninstalled sdk 3 and installed 2.9.1, backend error will clDevicesNum, running clinfo and the cards dont show up at all. any ideas? all miners fail to launch now
It happens if the SDK has installed libraries which it really hasn't been supposed to. The installer can detect fglrx only, not amdgpu-pro. Most likely the CPU only OpenCL stuff has overwritten the amdgpu-pro stuff. Remove these libOpenCL and libamdocl libraries, reinstall amdgpu-pro.
sure ill reinstall drivers, where are these libs i need to remove gonna be found
Maybe under /opt/AMDAPPSDK-2.9-1/lib
ill pull out the symlinks to /usr/lib first then. thanks for the quick support man, i appreciate this so much
ok reinstalled drivers all ok, clinfo detecting all cards and im back at the original issue :(
Since this is an LLVM error, I could also suggest to use old good GCC v4.x instead.
ok ill look into that tomorrow. late here ;D
@mark9064
I recompiled with GCC 4.9 from GCC 5, and I'm still getting the same error.
hmmm strange
I am using the new beta linux mining driver 17.40. It's got to be a driver issue. Sgminer is giving the same error for neoscrypt.
On Wed, Jan 3, 2018, 1:55 AM mark9064 notifications@github.com wrote:
hmmm strange
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ghostlander/nsgminer/issues/28#issuecomment-354973291, or mute the thread https://github.com/notifications/unsubscribe-auth/AIGiSVhrjHyDcjMtkydGNr3kt2rsmWRhks5tG057gaJpZM4RQG8N .
yup, its cause sgminer uses the same kernel as nsgminer (i think)
any ideas ghost?
No, SGminer employs Wolf0's NeoScrypt kernel.
I'm using amdgpu-pro v17.40 with GCC v5.4.0 on Ubuntu 16.04 with the default 4.10 kernel.
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (2482.3) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
The AMD APP SDK shouldn't really be an issue because the miner comes with the v2.9 headers which seem to be alright.
my clinfo literally returns the exact same, letter for letter. i wondered whether sgminer's kernel was the same as they crash in the exact same way, just with a different offset. i am using ubuntu 16.04 (server, so no xorg) with the 4.10 kernel too along with amdgpupro 17.40 as well as gcc 5.4.0.
exactly the same system setup???
do you think it could be different hardware causing the issue (using rx570s 4GB here)
It works with or without Xorg. I don't think there is much difference between RX480 and RX570, but who knows what their compiler does.
their shouldnt be any difference. both polaris cards. any way i can provide more info to help the problem here?
The only difference I can think of is the number of GPUs. Could you disable all of them except one either in software or hardware?
i can unplug them all :wink:
running 1 gpu only, no difference i wonder why the neoscrypt kernel has problems but the vilw kernel doesnt
what libllvm are you on? my system just prompted me to update but ill hold for now... currently on 4.0, prompting 5.0
The VLIW kernel doesn't use local (shared) memory. Branching reduced to the minimum. It's more straightforward which results in higher register usage and larger kernel size.
libllvm 4.0
so you reckon its something to do with shared memory? what could be causing issues with that?
It has something to do with poor AMD compiler quality. Used to be much better in the past.
so, do you think that this is easily fixable? would it be possible to see what the neoscrypt kernel from nsgminer and sgminer have in common to see what's causing the error? also what is the difference between the vilw and the vilwp kernel?
VLIWp is another implementation of VLIW with Salsa and ChaCha running in parallel rather than sequence. May or may not deliver better performance.
I have other priorities at the moment rather than working around AMD bugs once again. Just pick a kernel that works.
@ghostlander
Using neoscrypt_vliw kernel fixed my issue!
Thank you
System:
H81-PRO-BTC Celeron g1830 4gb ddr3 1600W evga supernova g2 6x rx570 Ubuntu server 16.04.3 (kernel 4.10) Repro steps: Install latest drivers and the amd app sdk Compile nsgminer Run neoscrypt with the default kernel (neoscrypt) Crash info: Error message printed:In hsa_operand section, at offset 3552: Address offset exceeds variable size LLVM ERROR: Brig container validation has failed in BRIGAsmPrinter.cpp
Using the neoscrypt_vilw kernel works ok but only yields about 550kh/s on each card (with bios mods/ oc on each card) This same crash also occurs if trying to use sgminer (I don't know what kernel it uses by default but judging by the almost identical crash message (only the offset number is different) it uses the neoscrypt kernel too) I have seen issue reports for this on sgminer too but the genesis mining fork is no longer maintained and nicehash decided to completely remove neoscrypt from their fork when the bug was reported