ethereum-mining / ethminer

Ethereum miner with OpenCL, CUDA and stratum support
GNU General Public License v3.0
5.97k stars 2.28k forks source link

Segmentation fault on Tesla K80 #1035

Closed yi-ji closed 6 years ago

yi-ji commented 6 years ago

My system info:

CUDA 9, Linux x86_64, Ubuntu 16.04, ethminer version 0.14.0.dev1, gcc5.4 Also I tried almost every ethminer version from 0.10 to 0.15, all the same.. What may be the problem? Thank you.

./ethminer -U --cuda-devices 1 2 -S asia.ethash-hub.miningpoolhub.com:20535 -O xxx.xxx:xxx
 cu  21:54:14|ethminer  Using grid size 8192 , block size 128
 cu  21:54:14|ethminer  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
 cu  21:54:14|ethminer  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
  ℹ  21:54:14|ethminer  Connecting to stratum server asia.ethash-hub.miningpoolhub.com:20535
  ℹ  21:54:14|stratum   Connected to stratum server asia.ethash-hub.miningpoolhub.com:20535
  ℹ  21:54:14|stratum   Starting farm
  ℹ  21:54:14|cuda-0    No work. Pause for 3 s.
  ℹ  21:54:14|cuda-1    No work. Pause for 3 s.
  ℹ  21:54:14|stratum   Subscribed to stratum server
  ℹ  21:54:14|stratum   Received new job #6614
  ℹ  21:54:15|stratum   Authorized worker xxx.xxx
  m  21:54:16|ethminer  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:00
  ℹ  21:54:17|cuda-0    Initialising miner 0
  ℹ  21:54:17|cuda-1    Initialising miner 1
  m  21:54:18|ethminer  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:00
 cu  21:54:19|cuda-1    Using device: Tesla K80  (Compute 3.7)
 cu  21:54:19|cuda-0    Using device: Tesla K80  (Compute 3.7)
 cu  21:54:19|cuda-1    Set Device to current
 cu  21:54:19|cuda-0    Set Device to current
 cu  21:54:19|cuda-1    Resetting device
 cu  21:54:19|cuda-0    Resetting device
  m  21:54:20|ethminer  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:00
Segmentation fault (core dumped)
AndreaLanfranchi commented 6 years ago

Using device: Tesla K80 (Compute 3.7)

Quite strange such low compute level.

Please try

./ethminer -G --opencl-devices 1 2 --opencl-platform 1 -S asia.ethash-hub.miningpoolhub.com:20535 -O xxx.xxx:xxx

yi-ji commented 6 years ago

Well....nothing happened after the command (not hanging, I got next prompt immediately).. even if I set verbosity to max 9

AndreaLanfranchi commented 6 years ago

CUDA 9, Linux x86_64, Ubuntu 16.04, ethminer version 0.14.0.dev1, gcc5.4

Please retry with latest master.

yi-ji commented 6 years ago

sorry, still the same...

./ethminer-e56090c -U --cuda-devices 1 -P stratum+tcp://username.workername:password@asia.ethash-hub.miningpoolhub.com:20535
  m  22:16:07|em-e56090c|  ethminer 0.15.0.dev7
  m  22:16:07|em-e56090c|  Build: linux / release
 cu  22:16:07|em-e56090c|  Using grid size 8192 , block size 128
 cu  22:16:07|em-e56090c|  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
  ℹ  22:16:08|em-e56090c|  Selected pool asia.ethash-hub.miningpoolhub.com:20535
  m  22:16:08|em-e56090c|  not-connected
  ℹ  22:16:08|stratum |  Trying 52.78.238.100:20535 ...
  ℹ  22:16:08|stratum |  Connected to asia.ethash-hub.miningpoolhub.com  [52.78.238.100:20535]
  ℹ  22:16:08|stratum |  Spinning up miners...
  ℹ  22:16:08|cuda-0  |  No work. Pause for 3 s.
  ℹ  22:16:08|stratum |  Subscribed to stratum server
  ℹ  22:16:08|stratum |  New pool difficulty:  4.30 gigahashes
  ℹ  22:16:08|stratum |  New job #7367c3e9…   asia.ethash-hub.miningpoolhub.com [52.78.238.100:20535]
  ℹ  22:16:09|stratum |  Authorized worker xxx.xxx
  ℹ  22:16:11|cuda-0  |  Initialising miner 0
 cu  22:16:11|cuda-0  |  Using device: Tesla K80  (Compute 3.7)
  m  22:16:13|em-e56090c|  Speed   0.00 Mh/s    gpu/0  0.00  [A0+0:R0+0:F0] Time: 00:00
 cu  22:16:13|cuda-0  |  Set Device to current
 cu  22:16:13|cuda-0  |  Resetting device
Segmentation fault (core dumped)
AndreaLanfranchi commented 6 years ago

Are you building yourself from master source ? or using a binary release ?

yi-ji commented 6 years ago

I am using binary release 0.15.0.dev7.

AndreaLanfranchi commented 6 years ago

It may be due to the fact that CUDA kernels are NOT compiled for Compute 3.7 (only 3.0 and 3.5).

Are you on Linux ? Are you able to compile from source ?

yi-ji commented 6 years ago

Yes it's Linux x86_64, Ubuntu 16.04.

By saying "complie" do you mean compiling ethminer or CUDA kernel? Former I can but not the latter...

AndreaLanfranchi commented 6 years ago

Compiling from source also compile the CUDA kernels.

If possible do the following.

Follow instructions as described here. https://github.com/ethereum-mining/ethminer#building-from-source

At point 3 replace

cmake ..

with

cmake -DETHASHCUDA=ON -DCOMPUTE=37 ..

It may take some time as it will likely download hunter and boost. Eventually run cmake as sudo.

Please report when done.

yi-ji commented 6 years ago

I followed your instruction and compiled it, still sadly, same error.. (I have made sure that I did add -DETHASHCUDA=ON -DCOMPUTE=37)

  m  22:00:37|em      |  ethminer 0.15.0.dev7-13+commit.d2e4ecd4
  m  22:00:37|em      |  Build: linux / release
 cu  22:00:37|em      |  Using grid size 8192 , block size 128
 cu  22:00:37|em      |  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
  ℹ  22:00:37|em      |  Selected pool asia.ethash-hub.miningpoolhub.com:20535
  m  22:00:37|em      |  not-connected
  m  22:00:42|em      |  not-connected
  ℹ  22:00:42|stratum |  Trying 52.78.238.100:20535 ...
  ℹ  22:00:42|stratum |  Connected to asia.ethash-hub.miningpoolhub.com  [52.78.238.100:20535]
  ℹ  22:00:42|stratum |  Spinning up miners...
  ℹ  22:00:42|cuda-0  |  No work. Pause for 3 s.
  ℹ  22:00:42|stratum |  Subscribed to stratum server
  ℹ  22:00:42|stratum |  New pool difficulty:  4.30 gigahashes
  ℹ  22:00:42|stratum |  New job #d71b73d7…   asia.ethash-hub.miningpoolhub.com [52.78.238.100:20535]
  ℹ  22:00:43|stratum |  Authorized worker xxx.xxx
  ℹ  22:00:45|cuda-0  |  Initialising miner 0
 cu  22:00:45|cuda-0  |  Using device: Tesla K80  (Compute 3.7)
 cu  22:00:47|cuda-0  |  Set Device to current
 cu  22:00:47|cuda-0  |  Resetting device
Segmentation fault (core dumped)
53nsk commented 6 years ago

Try to compile without optimizations (add -O0 flag).

yi-ji commented 6 years ago

@53nsk Could you tell me how to modify CMakeLists.txt? I add -O0 flag at Point 3 cmake -DETHASHCUDA=ON -DCOMPUTE=37 -O0 .. but still same crash..

53nsk commented 6 years ago

@yi-ji Insert new line "add_definitions(-O0)" without quotes between "function(configureProject)" and following "endfunction()", don't insert in "if" statements and then run make without parameters.

yi-ji commented 6 years ago

@53nsk thank you for your advice, I tried it but sadly it still crashed same...

yi-ji commented 6 years ago

I am not sure what has been done on the machine recently, but it appears to work well now. Current info: Driver Version: 390.30, cuda 9.1

yi-ji commented 6 years ago

It happens now, again... though nothing's changed

./ethminer -v 9 --cuda-devices 1 2 -U -P stratum2+tcp://xxx.xxx:xxx@us-east.ethash-hub.miningpoolhub.com:20535
  m  20:01:35|em-0.14 |  ethminer version 0.14.0
  m  20:01:35|em-0.14 |  Build: linux / release +git. 24c65cf
 cu  20:01:35|em-0.14 |  Using grid size 8192 , block size 128
 cu  20:01:35|em-0.14 |  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
 cu  20:01:35|em-0.14 |  Found suitable CUDA device [ Tesla K80 ] with 11996954624  bytes of GPU memory
  ℹ  20:01:35|em-0.14 |  Selected pool us-east.ethash-hub.miningpoolhub.com:20535
  m  20:01:35|em-0.14 |  not-connected
  ℹ  20:01:35|stratum |  Trying 18.204.40.75:20535 ...
  ℹ  20:01:35|stratum |  Connected to us-east.ethash-hub.miningpoolhub.com  [18.204.40.75:20535]
  ℹ  20:01:35|stratum |  Spinning up miners...
  ℹ  20:01:35|cuda-0  |  No work. Pause for 3 s.
  ℹ  20:01:35|cuda-1  |  No work. Pause for 3 s.
  ℹ  20:01:36|stratum |  Subscribed to stratum server
  ℹ  20:01:36|stratum |  Extranonce set to 8169
  ℹ  20:01:36|stratum |  Difficulty set to 1.00117
  ℹ  20:01:36|stratum |  New pool difficulty:  4.30 gigahashes
  ℹ  20:01:36|stratum |  New job #e248f396…   us-east.ethash-hub.miningpoolhub.com [18.204.40.75:20535]
  ℹ  20:01:36|stratum |  Authorized worker xxx.xxx
  ℹ  20:01:38|cuda-0  |  Initialising miner 0
  ℹ  20:01:38|cuda-1  |  Initialising miner 1
  ℹ  20:01:40|stratum |  New job #706142c7…   us-east.ethash-hub.miningpoolhub.com [18.204.40.75:20535]
  m  20:01:40|em-0.14 |  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:00
 cu  20:01:41|cuda-0  |  Using device: Tesla K80  (Compute 3.7)
 cu  20:01:41|cuda-0  |  Set Device to current
 cu  20:01:41|cuda-0  |  Resetting device
 cu  20:01:41|cuda-1  |  Using device: Tesla K80  (Compute 3.7)
 cu  20:01:41|cuda-1  |  Set Device to current
 cu  20:01:41|cuda-1  |  Resetting device
Segmentation fault (core dumped)

same for ethminer release 0.14~0.16

lesjokolat commented 6 years ago

Try does error message change?:

./ethminer -v 9 --cuda-devices 1 2 -U -P stratum1+tcp://xxx.xxx:xxx@us-east.ethash-hub.miningpoolhub.com:20535

Also do you have an inbuilt gpu on motherboard? If so try disabling it in bios...

yi-ji commented 6 years ago

@lesjokolat thank you for your help!

This time it repeated like:

 ℹ  23:45:20|cuda-1  |  No work. Pause for 3 s.
  m  23:45:21|em-0.14 |  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:01
  ℹ  23:45:23|cuda-0  |  No work. Pause for 3 s.
  ℹ  23:45:23|cuda-1  |  No work. Pause for 3 s.
  m  23:45:26|em-0.14 |  Speed   0.00 Mh/s    gpu/0  0.00  gpu/1  0.00  [A0+0:R0+0:F0] Time: 00:01
  ℹ  23:45:26|cuda-0  |  No work. Pause for 3 s.
  ℹ  23:45:26|cuda-1  |  No work. Pause for 3 s.

I am using Ver 0.14, and if I switch to 0.15, it exited directly:

m 23:45:53 em-0.15  ethminer 0.15.0
 m 23:45:53 em-0.15  Build: linux/release
Error: Insufficient CUDA driver: 9010

terminate called without an active exception
Aborted (core dumped)

And no, I do not have an built-in GPU on my motherboard

lesjokolat commented 6 years ago
Insufficient CUDA driver

Error: Insufficient CUDA driver: 9010

You have to upgrade your Nvidia drivers. On Linux, install nvidia-396 package or newer.
lesjokolat commented 6 years ago

I would also move to latest 16 version.