KlausT / ccminer

Software for mining various cryptocoins
GNU General Public License v3.0
403 stars 312 forks source link

NeoScrypt Crashing, Not Exiting Bugs #89

Open joesixpack opened 6 years ago

joesixpack commented 6 years ago

The test miner provided at: https://github.com/KlausT/ccminer/issues/50

...is the only one that doesn't crash on my 1080Ti for neoscrypt. However, the latest 8.15 release still has that crashing bug, but may have the -r 0 fix referenced here: https://github.com/KlausT/ccminer/issues/57

I need both in one miner.

Braintelligence commented 6 years ago

I have the same problem for my 1070 Ti rigs only, for some reason...

nfllab commented 6 years ago

My ccminer was crashing on a vmovdqa instruction. I've fixed it by changing the neoscrypt_xor function inside sph/neoscrypt.cpp:

--- neoscrypt.cpp.old   2017-11-21 17:43:36.000000000 +0100
+++ neoscrypt.cpp   2017-12-15 19:19:37.550565124 +0100
@@ -481,10 +481,10 @@
     ulong *src = (ulong *) srcp;
     uint i, tail;

-    for(i = 0; i < (len / sizeof(ulong)); i++)
-      dst[i] ^= src[i];
+//    for(i = 0; i < (len / sizeof(ulong)); i++)
+//      dst[i] ^= src[i];

-    tail = len & (sizeof(ulong) - 1);
+    tail = len;// & (sizeof(ulong) - 1);
     if(tail) {
         uchar *dstb = (uchar *) dstp;
         uchar *srcb = (uchar *) srcp;
Braintelligence commented 6 years ago

@KlausT Please implement this.

KlausT commented 6 years ago

On my system ccminer is not crashing at all. I can't reproduce this. But ok, I will see what I can do.

Braintelligence commented 6 years ago

These issues are on 1080 Ti and 1070 Ti, FYI

KlausT commented 6 years ago

Could be related to the intensity / the memory size. The latest commit will fix it, I hope. Since I don't have a Ti card I can't test it here.

nfllab commented 6 years ago

As far as I understand this code runs on the CPU, so it shouldn't be related to the GPU. I was thinking that the issue is related to my compiler, but then I can't explain the crash of the others who use the official binaries. I wonder if the commit fixes their problem, too.

My config if it helps: i7-4790 (Haswell) 1050 Ti Ubuntu 17.10 GCC 7.2 CUDA 8

KlausT commented 6 years ago

I'm hoping that this commit: https://github.com/KlausT/ccminer/commit/250e14cbaf4cdd432a223ec0e526a103c6eedb9f will fix the CUDA errors because there was a possible integer overflow that could cause illegal memory accesses on the GPU. All the segfaults on Linux systems have probably other causes.

Please test the latest commits: Source: https://github.com/KlausT/ccminer/archive/cuda9.zip Windows binary: ccminer-test-x64.zip

nfllab commented 6 years ago

Compilation of the current git version dies on my linux:

nvcc -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_30,code=sm_30 -I/usr/local/cuda/include -I. -O3 -std=c++11 -Xcompiler -fno-strict-aliasing -Wall -D_FORCE_INLINES  --ptxas-options="-v" --maxrregcount=128 -o cuda_groestlcoin.o -c cuda_groestlcoin.cu
nvcc fatal   : Unknown option 'Wall'
Makefile:1882: recipe for target 'cuda_groestlcoin.o' failed

If I change "-Xcompiler -fno-strict-aliasing -Wall" to "-Xcompiler -fno-strict-aliasing,-Wall", then it works.

Compilation of the attached cuda9.zip additionally dies with CUDA 8 at:

nvcc -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_70,code=sm_70 -I/usr/local/cuda/include -I. -O3 -std=c++11 -Xcompiler -fno-strict-aliasing,-Wall -D_FORCE_INLINES  --ptxas-options="-v" --maxrregcount=128 -o cuda_groestlcoin.o -c cuda_groestlcoin.cu
nvcc fatal   : Unsupported gpu architecture 'compute_70'
Makefile:1883: recipe for target 'cuda_groestlcoin.o' failed
KlausT commented 6 years ago

ok, I have fixed configure.sh (I think) The windows branch is for CUDA 8, the cuda9 branch is for CUDA 9.x

joesixpack commented 6 years ago

Okay, the Neoscrypt patch referenced earlier by nfllab applied to 8.15 results in a core dump when exiting the miner:

image

https://github.com/KlausT/ccminer/archive/cuda9.zip on 16.04 or 17.10 has the same compile error as this one: https://github.com/KlausT/ccminer/issues/92

I'm not able to test the pre-compiled Windows version atm (to see if it still crashes mining Neoscrypt on a Ti). Can someone else?

KlausT commented 6 years ago

Please don't use Ubuntu 17.10, I don't know if it's compatible. 16.04 should be ok. Or 17.04 for CUDA 9.1 Using the latest Linux versions for compiling ccminer is generally a bad idea

joesixpack commented 6 years ago

Looks like 8.17 fixed the issues. Knock on wood!

joesixpack commented 6 years ago

image

KlausT commented 6 years ago

I thought you are using 8,17 now ?

joesixpack commented 6 years ago

Whoops, not sure what happened there.

joesixpack commented 6 years ago

Still happening on 8.17. It's not exclusive to NeoScrypt as it happened on Groestl too. It seems less to do with any mining and more about the exiting.

image

Braintelligence commented 6 years ago

Ooooh, and I was wondering which algorithm threw those errors. So it was KlausT?

Braintelligence commented 6 years ago

Actually, I think it doesn't always result in a simple error window, but I think this is what crashed one of my rigs overnight, making it reboot. Now I'm not sure what to do.

KlausT commented 6 years ago

It's not crashing at all on my system, I can't reproduce this.

Braintelligence commented 6 years ago

I think I found another cause for the crashing. But still I have rigs that show exactly such an error window, and several of those stacking when enough time passes. I can't positively tell if the KlausT version is the culprit here, though =(.

joesixpack commented 6 years ago

It is KlausT. None of the other ccminer's give this problem. Here's a copy of the problem details from the dialog box (8.18 CUDA 9.1):

Problem signature: Problem Event Name: APPCRASH Application Name: ccminer.exe Application Version: 0.0.0.0 Application Timestamp: 5a4a6e89 Fault Module Name: StackHash_48d7 Fault Module Version: 6.1.7601.23915 Fault Module Timestamp: 59b94ee4 Exception Code: c0000374 Exception Offset: 00000000000bf3e2 OS Version: 6.1.7601.2.1.0.256.1 Locale ID: 1033 Additional Information 1: 48d7 Additional Information 2: 48d7d9e7b54549393f69a4a65eee70d7 Additional Information 3: 05fe Additional Information 4: 05feaa65322395330360a8f5ca947f22

Read our privacy statement online: http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline: C:\Windows\system32\en-US\erofflps.txt

Braintelligence commented 6 years ago

Sadly I still had a lot of problems with KlausT especially on my 1070 Ti and 1080 Ti rigs, so I had to completely deactivate it and switch to TPruvot.

KlausT commented 6 years ago

I don't understand why it is crashing on your system, but not on mine. Windows users: what's different on your system? I'm using Windows 10 with all the latest updates 8 GB RAM GTX 1070 Nvidia driver 388.71

By the way, would you please test the latest version?

Braintelligence commented 6 years ago

Try it with 1070 Ti or 1080 Ti I guess?

KlausT commented 6 years ago

If you give me the money to buy one

Braintelligence commented 6 years ago

Do you have a PayPal charity link or a wallet address? I guess enough people should be willing to give you enough for a 1070 Ti 👍 It's "just" half an ETH ^^.

joesixpack commented 6 years ago

Tpruvot is no panacea. It's already sloweddown/lockedup/BSOD/crashed on skein, neoscrypt and lyra2v2 and who knows how many more to come. Basically, every time Tpruvot incorporates a third-party speedup or new algo, it makes the whole enchilada even more unstable.

Braintelligence commented 6 years ago

TPruvot is what works stable for several days now for my 13x 1070Ti rigs. I do me and you do you.

joesixpack commented 6 years ago

1080Ti's, Windows 7 with latest, NVIDIA 388.59 here. Brain, what NVIDIA driver are using with Tpruvot?

Just so you know, Klaust, this crashing problem on exiting is not specific to any algo, but on anything the miner [tries to] runs. So as I said before, the problem is in the exiting and not the mining. Judging by the text output, you're doing something different on exiting that none of the other ccminers are doing. What is different about this working binary https://github.com/KlausT/ccminer/files/1236886/ccminer-neoscrypt-1080ti-test.zip than the latest (besides having the -r bug)?

Braintelligence commented 6 years ago

1070 Ti, Windows 10 Pro, NVIDIA 388.71 The rig I looked at right now by chance is running TPruvot lyra2z on 13x 1070 Ti for 2 hours straight now. No errors at all.

joesixpack commented 6 years ago

Seems like CUDA 9.1 is only supported on 388.71 even though the SDK came out long before. Doesn't make a difference to the exit crashing, though.

Braintelligence commented 6 years ago

I use MPM which is multi-algo profit-switching all the time. I see no exit crash problems currently, at least not every day. On one of my rigs I do see this regularly but it also happens on DSTM Equihash, so I think one of the cards just has a tad bit too much OC.

KlausT commented 6 years ago

I have made a small change now, maybe this will help. Windows binary: ccminer-818exitfix-debug-cuda91-x64.zip

joesixpack commented 6 years ago

Interesting, it still exit crashes. Does that mean its the -r fix?

KlausT commented 6 years ago

I don't think so. Maybe it's this line that was added two months ago: https://github.com/KlausT/ccminer/blob/f2c02c0454b11ada0098597e6d02c83c9ed2e38a/ccminer.cpp#L460 That would only affect Linux systems, I think.

swittmann commented 6 years ago

I have the same issue with GTX 1070Ti & GTX 970 tried with & without an OC same result. Windows 10 Pro, 388.71 driver, 8GB Ram

joesixpack commented 6 years ago

The crash dialog that pops up is actually WerFault.exe Whether or not you disable error reporting, it will still show up. On a 1080 non-Ti, no crashing on any algo at all.

KlausT commented 6 years ago

Does it say something like this?

"The instruction at 0x0000000075983703 referenced memory at 0x0000000000000000. The memory could not be read"

Maybe this could help: answers.microsoft.com