fireice-uk / xmr-stak-amd

Monero AMD miner
GNU General Public License v3.0
193 stars 114 forks source link

Ubuntu Server 16.04.2 + AMD HD6870 = Error CL_DEVICE_NOT_FOUND #49

Open Isuress opened 7 years ago

Isuress commented 7 years ago

Hello,

$1 I installed the AMDpro driver following the link you posted and their instructions. I cloned and compiled your program. I had to copy the config.txt from the source directory into the bin directory because it wasn't created. Here's my config:

/* 
 * Number of GPUs that you have in your system. Each GPU will get its own CPU thread.
 */
"gpu_thread_num" : 1,

/*
 * GPU configuration. You should play around with intensity and worksize as the fastest settings will vary.
 *      index    - GPU index number usually starts from 0
 *  intensity    - Number of parallel GPU threads (nothing to do with CPU threads)
 *   worksize    - Number of local GPU threads (nothing to do with CPU threads)
 * affine_to_cpu - This will affine the thread to a CPU. This can make a GPU miner play along nicer with a CPU miner.
 */
"gpu_threads_conf" : [ 
    { "index" : 0, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
],

/*
 * Platform index. This will be 0 unless you have different OpenCL platform - eg. AMD and Intel.
 */
"platform_index" : 0,

/*
 * TLS Settings
 * If you need real security, make sure tls_secure_algo is enabled (otherwise MITM attack can downgrade encryption
 * to trivially breakable stuff like DES and MD5), and verify the server's fingerprint through a trusted channel. 
 *
 * use_tls         - This option will make us connect using Transport Layer Security.
 * tls_secure_algo - Use only secure algorithms. This will make us quit with an error if we can't negotiate a secure algo.
 * tls_fingerprint - Server's SHA256 fingerprint. If this string is non-empty then we will check the server's cert against it.
 */
"use_tls" : false,
"tls_secure_algo" : true,
"tls_fingerprint" : "",

/*
 * pool_address   - Pool address should be in the form "pool.supportxmr.com:3333". Only stratum pools are supported.
 * wallet_address - Your wallet, or pool login.
 * pool_password  - Can be empty in most cases or "x".
 */
"pool_address" : "mine.xmrpool.net:5555",
"wallet_address" : "HIDDEN",
"pool_password" : "HIDDEN",

When I go to run the program, I get the error:

[2017-06-16 17:34:45] : Compiling code and initializing GPUs. This will take a while...
[2017-06-16 17:34:45] : Error CL_DEVICE_NOT_FOUND when calling clGetDeviceIDs for number of devices.

I've read through other people's issues and googled the errors. I've tried some of the things people suggested but didn't get a fix. Out of curiosity, I ran clinfo (had to apt-get it first) and got this:

Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (2348.3)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3 (sse2,avx)
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)
  Device Topology (AMD)                           (n/a)
  Max compute units                               4
  Max clock frequency                             1601MHz
  Device Partition                                (core, cl_ext_device_fission)
    Max number of sub-devices                     4
    Supported partition types                     equally, by counts, by affinity domain
    Supported affinity domains                    L3 cache, L2 cache, L1 cache, next partitionable
    Supported partition types (ext)               equally, by counts, by affinity domain
    Supported affinity domains (ext)              L3 cache, L2 cache, L1 cache, next fissionable
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple              1
  Preferred / native vector sizes
    char                                                16 / 16
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 2 / 2
    half                                                 4 / 4        (n/a)
    float                                                8 / 8
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16753750016 (15.6GiB)
  Error Correction support                        No
  Max memory allocation                           4188437504 (3.901GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     8
  Max size of kernel argument                     4096 (4KiB)
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1497648123660783719ns (Fri Jun 16 17:22:03 2017)
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            65536 (64KiB)
  Built-in kernels
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

From the looks of it, my CPU has an integrated GPU. I attempted to go to the BIOS and disable it but there was no such option. Instead I opted to change the "primary display device" to PCIE and hoped that would work. It didn't. So it seems that for whatever reason, my Ubuntu has decided that the Intel GPU is the first and only platform to be used. I ran the lspci -vnnn | perl -lne 'print if /^\d+\:.+(\[\S+\:\S+\])/' | grep VGA command and got:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Barts XT [Radeon HD 6870] [1002:6738] (prog-if 00 [VGA controller])

So the card is there and the OS sees it, but for whatever reason it's not using it accordingly. What should I do? I'm currently using your CPU miner on some other servers and it's working just fine.

$2 Well, sort of fine actually... I might as well post it here 'cause hopefully you'll be reading this anyway. So I have an HP DL320G6 with a Intel Xeon E5630, running ESXI and in Ubuntu Server 1604.2 in a VM. The E5630 has 8 Logical Processors so I set the VM to have 8 cores. When setting up the config.txt, I created 8 CPU_AFFINITY related entries and then ran the program (after doing the page pool size, etc - which actually still gives me errors btw). I only got like 120hash or something like that. I went to the config.txt and changed the 8 CPU_AFFINITYs into 6 of them; and then ran. My hash then went to 220hash. How come using 6 gets a better has than 220? (This is only the first part of this question)

With all that said, I have an HP ML370 with Dual-Proc Intel Xeon L5630 (2 CPUs). This means I have 16 Logical Processors instead of 8 (as noted by ESXI). I changed the VM's CPUs from 8 to 16. I did all the same setup and config as mentioned prior; except I changed the CPU_AFFINITY from 6 CPUs to 16 CPU entries. My hash was only like 160? I changed my VM's CPUs to 8, and then used the same config as my DL320G6. Now I get 220ish hash. What is going on? How come it's not making use of the 10 extra cores available to it? What should I do? Is there any other information I can provide you to help possibly?

P.S. - If you'd like me to move the second part of this question to the issues section of your CPU miner; I can.

NOTE CPU_AFFINITY = { "low_power_mode" : false, "no_prefetch" : true, "affine_to_cpu" : 0 },

Panzerfather commented 7 years ago
  1. Your GPU (AMD 6870) isn't supported in the latest AMD OpenCL drivers anymore, so it won't be listed via clinfo without a driver. You have to install the corresponding open source mesa driver to get your card working, but you will have to look what kind of performance the drivers will have. On Fedora/CentOS/RHEL this would be mesa-libOpenCL but for Ubuntu I don't know the exact name without a running Ubuntu. Maybe this wiki can help you.

  2. It depends on the L3-Cache of your CPU which performance your output will be. Running the initial config.txt will post you in the right direction with a good suggestion. E.g. if there are only 4 of 8 cores listed you will have a too low L3 cache to run all 8 cores together for cryptonight mining. Cryptonight does need exactly 2MB per running core. If you are not quite sure how many L3 cache your CPUs have, you can take a look in the Intel documentation.

Isuress commented 7 years ago
  1. No kidding? Damn. I'll have to google the MESA driver. I could potentially use an older version of Ubuntu maybe? I'm using 16.04.2 at the moment but I could switch to Ubuntu Server 14.04.5? Would that have the older driver pre-installed that would all OpenCL to work? I'll maybe give it a try tomorrow. I'll have probably tried it before you have a chance to respond but any and all advice is welcome. (Thank you by the way)

  2. From a quick google search, it seems that my Intel Xeon E5630 CPUs (DL320G6) have 4 cores (with 2 logical each) with12MB of SmartCache; while my Intel Xeon L5630 have (ML370G6) have the same as above. If what you're saying is true, then there should be more than enough L3-Cache to go around? If I have x2 L5630s in a dual-proc server, then it should have 24MB of L3-Cache; which would be enough. Unless the logical cores also need their own share of the cache? Maybe I'm not doing a setting correctly? I still get the NMAP errors with relation to the pool memory and whatnot even though I made the change the developer suggested in the page file. Unless I did it wrong?

I wonder if the other miners that are available out there would give me less issues for equal to or greater than hash/s.

psychocrypt commented 7 years ago

@Isuress I using ubuntu 14.04 for my old amd gpu, this should work for you too. If you are using the old ubuntu you need to google and add package source for the compiler gcc 5.1+ because the system compiler is not supported. For the cpu version use the current dev branch with hwloc and xmr-stak will show you the optimal config at the first start.

Panzerfather commented 7 years ago

@Isuress

  1. Ubuntu 14.04 should go, but 16.04 should also work with the right drivers installed.

  2. As you can see your bottle neck is your L3-Cache. You have 12 MB for 8 cores so you need 8 x 2MB = 16MB L3 cache to run all cores at optimum performance. But your cache size is only 12MB, so you can only run 6 cores (6 x 2MB) at the maximum performance just like you already discovered. The 2MB is from the crypto algorithm specification and is not a requirement of xmr-stak-cpu, so other miners could not give you (much if any) more performance. And no, with 2x CPUs you don't have automatically enough L3 cache to run 8 cores. You have to make sure that at least 4-6 cores run on the first CPU and the remaining 2 run on the other processor. L3 cache is different than RAM and is only available and access-able from the corresponding CPU.

And for the NMAP problem: do you use a graphical environment and start the cpu-miner there? If so, you also have to edit the file /etc/pam.d/common-session and add the line session required pam_limits.so. After that you have to logout and login again or reboot.