KlausT / ccminer

Software for mining various cryptocoins
GNU General Public License v3.0
403 stars 312 forks source link

linux ccminer segfaults on startup #181

Closed nazerim closed 6 years ago

nazerim commented 6 years ago

ccminer compiled fine with cuda 9.0 and was working fine, suddenly after a while it crashes. Without rebooting, it will crash on startup. I've compiled the linux debug version and generated the stack trace - hopefully this will help pin down the issue:

$ catchsegv ~/ccminer/ccminer-klaust-debug -r 0 -a neoscrypt -o stratum+tcp://neoscrypt.mine.ahashpool.com:4233 -u address -p "c=BTC" --api-bind=host:4068 -d 0,7,8,9 2>&1 > stacktrace.txt [2018-01-28 21:24:51] Starting Stratum on stratum+tcp://neoscrypt.mine.ahashpool.com:4233 [2018-01-28 21:24:52] Stratum difficulty set to 1024 [2018-01-28 21:24:52] neoscrypt.mine.ahashpool.com:4233 neoscrypt block 68356 [2018-01-28 21:24:52] NVML GPU monitoring enabled. [2018-01-28 21:24:52] 4 miner threads started, using 'neoscrypt' algorithm. [2018-01-28 21:24:53] GPU #0: using default intensity 15.250 [2018-01-28 21:24:53] GPU #7: using default intensity 15.250 [2018-01-28 21:24:53] GPU #8: using default intensity 15.250 [2018-01-28 21:24:53] GPU #9: using default intensity 15.250 [2018-01-28 21:24:55] neoscrypt.mine.ahashpool.com:4233 neoscrypt block 68357 Segmentation fault

stacktrace.txt

nazerim commented 6 years ago

Ok, I've found the root cause - ccminer is unable to read cpuinfo_cur_freq due to root only read permissions --

in sysinfos.cpp:42

define CPUFREQ_PATH \

"/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq" static uint32_t linux_cpufreq(int core) { FILE *fd = fopen(CPUFREQ_PATH, "r"); uint32_t freq = 0;

    if(!fd)
    {
            fscanf(fd, "%d", &freq);
            fclose(fd);
    }
    return freq;

}

$ ls -l /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq -r-------- 1 root root 4096 Jan 28 02:43 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

Quick fix is to apply chmod 444 to this, a better fix would maybe to read scaling_cur_freq instead - see https://www.kernel.org/doc/Documentation/cpu-freq/pcc-cpufreq.txt

KlausT commented 6 years ago

fopen() doesn't return NULL ? Why? And cpuinfo_cur_freq has different permissions than scaling_cur_freq? This doesn't make sense. But this shouldn't surprise me. Nothing under Linux makes sense.

nazerim commented 6 years ago
  1. No idea why fopen() doesn't return NULL to begin with. Perhaps change line 49 test to if(fd!=NULL) might help?
  2. Permissions: There's a discussion here, apparently cpuinfo_cur_freq is privileged: https://superuser.com/questions/1032357/why-is-current-cpu-frequency-read-only-for-root

I've changed to read scaling_cur_freq on my system, seems to work properly now. Need to investigate further on whats happening with fd in fopen() here

nazerim commented 6 years ago

I've changed line 49 to if(fd!=NULL) and recompiled - no crashes.

nazerim commented 6 years ago

I've built and tested with your updated sysinfos.cpp - seems to work fine now.