Bdot42 / mfakto

Mersenne number trial factoring using OpenCL, primarily for GIMPS: Great Internet Mersenne Prime Search
http://mersennewiki.org/index.php/Mfakto
GNU General Public License v3.0
32 stars 17 forks source link

is -d working? #7

Open c-bug opened 5 years ago

c-bug commented 5 years ago

Hi,

I have got 2 GPU in my System, one in the APU A8 7600, and a dedicated RX 550.

clinfo reports:

Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (2766.4)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    AMD Radeon Graphics
  Device Topology:                               PCI[ B#0, D#1, F#0 ]
  Max compute units:                             6
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           720Mhz
  Address bits:                                  64
  Max memory allocation:                         1369020825
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            2079440896
  Constant buffer size:                          1369020825
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Max pipe arguments:                            0
  Max pipe active reservations:                  0
  Max pipe packet size:                          0
  Max global variable size:                      0
  Max global variable preferred total size:      0
  Max read/write image args:                     0
  Max on device events:                          0
  Queue on device max size:                      0
  Max on device queues:                          0
  Queue on device preferred size:                0
  SVM capabilities:
    Coarse grain buffer:                         No
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                No
    Profiling :                                  No
  Platform ID:                                   0x7fc59ca45e30
  Name:                                          Spectre
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                2766.4
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (2766.4)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    Radeon RX 550 Series
  Device Topology:                               PCI[ B#1, D#0, F#0 ]
  Max compute units:                             10
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1071Mhz
  Address bits:                                  64
  Max memory allocation:                         3414774169
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            4280410112
  Constant buffer size:                          3414774169
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Max pipe arguments:                            0
  Max pipe active reservations:                  0
  Max pipe packet size:                          0
  Max global variable size:                      0
  Max global variable preferred total size:      0
  Max read/write image args:                     0
  Max on device events:                          0
  Queue on device max size:                      0
  Max on device queues:                          0
  Queue on device preferred size:                0
  SVM capabilities:
    Coarse grain buffer:                         No
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                No
    Profiling :                                  No
  Platform ID:                                   0x7fc59ca45e30
  Name:                                          Baffin
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                2766.4
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (2766.4)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Running ./mfakto results in:

# ./mfakto
mfakto 0.15pre6 (64bit build)

Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Baffin (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
  name                      Baffin (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.2 AMD-APP (2766.4) (2766.4)
  maximum threads per block 1024
  maximum threads per grid  1073741824
  number of multiprocessors 10 (640 compute elements)
  clock rate                1071MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
  GPUSievePrimes (adjusted) 81206
  GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
  number of tests           30
  successful tests          30

selftest PASSED!

got assignment: exp=192266629 bit_min=73 bit_max=74 (9.95 GHz-days)
Starting trial factoring M192266629 from 2^73 to 2^74 (9.95GHz-days)
Using GPU kernel "cl_barrett15_74_gs_2"

Found a valid checkpoint file.
  last finished class was: 2424

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 26 19:56 | 3075  66.8% | 15.853   1h24m |     56.49    81206    0.00%

and when I run a second mfakto with ./mfakto -d 1 I get this output:

# ./mfakto -d 1
mfakto 0.15pre6 (64bit build)

Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Spectre (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
  name                      Spectre (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.2 AMD-APP (2766.4) (2766.4)
  maximum threads per block 1024
  maximum threads per grid  1073741824
  number of multiprocessors 6 (384 compute elements)
  clock rate                720MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN

Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
  GPUSievePrimes (adjusted) 81206
  GPUsieve minimum exponent 1037054
Started a simple selftest ...
Selftest statistics
  number of tests           30
  successful tests          30

selftest PASSED!

got assignment: exp=189464927 bit_min=73 bit_max=74 (10.10 GHz-days)
Starting trial factoring M189464927 from 2^73 to 2^74 (10.10GHz-days)
Using GPU kernel "cl_barrett15_74_gs_2"

Found a valid checkpoint file.
  last finished class was: 72

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jan 26 19:58 |   84   1.7% | 32.163   8h26m |     28.25    81206    0.00%

Then both instances are run by ~28 GHz-d/d and their ETA are around 8hours ... so they share ressources...

Any ideas how to solve this? It looks like, mfakto is not using the RX550 and only using the APU.

valeriob01 commented 5 years ago

I use -d 01 , -d 02 , and it works.

c-bug commented 5 years ago

Ok, for me 01 and 02 is not working ... that said ... my devices are 0 and 1... 02 is not existent.

and there is no difference between 01 and 1 or 00 and 0.

valeriob01 commented 5 years ago

Can you try this command: lspci and lscpu and paste the output?

c-bug commented 5 years ago

Of course,

# lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) I/O Memory Management Unit
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics]
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1425
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Root Port
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 09)
00:10.1 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 09)
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 11)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 16)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] FCH PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 15h (Models 30h-3fh) Processor Function 5
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Polaris11] (rev ff)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X]
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0b)
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       48 bits physical, 48 bits virtual
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          21
Model:               48
Model name:          AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
Stepping:            1
CPU MHz:             2394.945
CPU max MHz:         3100.0000
CPU min MHz:         1400.0000
BogoMIPS:            6188.42
Virtualization:      AMD-V
L1d cache:           16K
L1i cache:           96K
L2 cache:            2048K
NUMA node0 CPU(s):   0-3
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate ssbd vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov
valeriob01 commented 5 years ago

Your RX550 is recognized by the BIOS and PCI subsystem, the problem must be elsewhere. You should turn off the Audio on the Radeon RX gpu.

c-bug commented 5 years ago

I removed the driver from the Kaveri GPU. To test, what happens running only 1 GPU.

Here are the results:

# clinfo
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2766.4)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     Baffin
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2766.4)
  Driver Version                                  2766.4
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Board Name (AMD)                         Radeon RX 550 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               10
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1071MHz
  Graphics IP (AMD)                               8.0
  Device Partition                                (core)
    Max number of sub-devices                     10
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size (AMD)                 256
  Max work group size (AMD)                       1024
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes
    char                                                 4 / 4
    short                                                2 / 2
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4289224704 (3.995GiB)
  Global free memory (AMD)                        4169112 (3.976GiB)
  Global memory channels (AMD)                    4
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           3422266572 (3.187GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384 (16KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 pixels
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max number of constant args                     8
  Max constant buffer size                        3422266572 (3.187GiB)
  Preferred constant buffer size (AMD)            16384 (16KiB)
  Max size of kernel argument                     1024
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1548606835227972908ns (Sun Jan 27 17:33:55 2019)
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    Number of async queues (AMD)                  2
    Max real-time compute queues (AMD)            0
    Max real-time compute units (AMD)             0
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_amd_bus_addressable_memory cl_khr_spir cl_khr_gl_event
# ./mfakto
mfakto 0.15pre6 (64bit build)

Runtime options
  Inifile                   mfakto.ini
  Verbosity                 3
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 compact
  V5UserID                  none
  ComputerID                none
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  TimeStampInResults        yes
  VectorSize                2
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
Device 1/1: Baffin (Advanced Micro Devices, Inc.),
device version: OpenCL 1.2 AMD-APP (2766.4), driver version: 2766.4
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_amd_bus_addressable_memory cl_khr_spir cl_khr_gl_event
Global memory:4289224704, Global memory cache: 16384, local memory: 32768, workgroup size: 256, Work dimensions: 3[1024, 1024, 1024, 0, 0] , Max clock speed:1071, compute units:10
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Baffin (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.

OpenCL device info
  name                      Baffin (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.2 AMD-APP (2766.4) (2766.4)
  maximum threads per block 1024
  maximum threads per grid  1073741824
  number of multiprocessors 10 (640 compute elements)
  clock rate                1071MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN
Loading binary kernel file mfakto_Kernels.elf
Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -DGCN -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
        BUILD OUTPUT
Error: AMD HSA Code Object loading failed.

        END OF BUILD OUTPUT
Error -11 (Build program failure): clBuildProgram
ERROR: load_kernels(0) failed

Ideas?

valeriob01 commented 5 years ago

can you try this as root: dmesg | grep amdkfd ?

c-bug commented 5 years ago

no result shown.

valeriob01 commented 5 years ago

I suspect a problem with your amdkfd, you could try to reinstall the drivers.

c-bug commented 5 years ago

Oh maybe we are talking about different things. The HSA (amdkfd) is of course working as soon as I enable the APU drivers, but the RX 550 is not working. I do not think that the RX 550 got any HSA.

After re-enabling APU graphic unit, I get

[    5.774162] kfd kfd: Initialized module
[    6.012461] kfd kfd: Allocated 3969056 bytes on gart
[    6.012638] kfd kfd: added device 1002:1313
[    6.016116] kfd kfd: skipped device 1002:67ff, PCI rejects atomics

And mfakto is working again.

1002:67ff, is the RX 550.

valeriob01 commented 5 years ago

I'm sorry about this error, the command should be dmesg | grep amd , and it must show if your gpus have been initialized.

c-bug commented 5 years ago
[    0.000000] Linux version 4.19.0-1-amd64 (debian-kernel@lists.debian.org) (gcc version 8.2.0 (Debian 8.2.0-13)) #1 SMP Debian 4.19.12-1 (2018-12-22)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-1-amd64 root=UUID=1340a5b2-3b49-4ce4-a11f-30bb0dd00c35 ro pcie_aspm=force pcie_aspm.policy=powersave pcie_port=native nmi_watchdog=0 radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dc=0 amdgpu.audio=0
[    0.000000] random: get_random_u32 called from bsp_init_amd+0x20b/0x2b0 with crng_init=0
[    0.169626] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-1-amd64 root=UUID=1340a5b2-3b49-4ce4-a11f-30bb0dd00c35 ro pcie_aspm=force pcie_aspm.policy=powersave pcie_port=native nmi_watchdog=0 radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.dc=0 amdgpu.audio=0
[    1.102340] amd_uncore: AMD NB counters detected
[    1.102683] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.829041] usb usb1: Manufacturer: Linux 4.19.0-1-amd64 ehci_hcd
[    1.830288] usb usb2: Manufacturer: Linux 4.19.0-1-amd64 xhci-hcd
[    1.834693] usb usb4: Manufacturer: Linux 4.19.0-1-amd64 xhci-hcd
[    1.845096] usb usb3: Manufacturer: Linux 4.19.0-1-amd64 ehci_hcd
[    1.847279] usb usb5: Manufacturer: Linux 4.19.0-1-amd64 xhci-hcd
[    1.849438] usb usb7: Manufacturer: Linux 4.19.0-1-amd64 xhci-hcd
[    1.912886] usb usb6: Manufacturer: Linux 4.19.0-1-amd64 ohci_hcd
[    1.976889] usb usb8: Manufacturer: Linux 4.19.0-1-amd64 ohci_hcd
[    2.040887] usb usb9: Manufacturer: Linux 4.19.0-1-amd64 ohci_hcd
[    5.590041] EDAC amd64: Node 0: DRAM ECC disabled.
[    5.590075] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[    5.842505] [drm] amdgpu kernel modesetting enabled.
[    5.887406] amdgpu 0000:00:01.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[    5.887442] amdgpu 0000:00:01.0: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
[    5.890552] [drm] amdgpu: 2048M of VRAM memory ready
[    5.890588] [drm] amdgpu: 3072M of GTT memory ready.
[    5.890936] [drm] amdgpu: dpm initialized
[    5.895861] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_pfp.bin
[    5.896152] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_me.bin
[    5.896307] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_ce.bin
[    5.896534] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_mec.bin
[    5.898066] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_mec2.bin
[    5.898243] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_rlc.bin
[    5.921515] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_sdma.bin
[    5.922104] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_sdma1.bin
[    5.931020] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_uvd.bin
[    5.943891] amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/kaveri_vce.bin
[    6.390181] fbcon: amdgpudrmfb (fb0) is primary device
[    6.890534] amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer device
[    6.922502] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:00:01.0 on minor 0
[    6.922731] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    7.261454] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_k_mc.bin
[    7.261495] amdgpu 0000:01:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    7.261525] amdgpu 0000:01:00.0: GART: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    7.261612] [drm] amdgpu: 4096M of VRAM memory ready
[    7.261631] [drm] amdgpu: 4096M of GTT memory ready.
[    7.279095] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_pfp_2.bin
[    7.280333] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_me_2.bin
[    7.280514] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_ce_2.bin
[    7.281935] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_rlc.bin
[    7.285310] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_mec_2.bin
[    7.287154] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_mec2_2.bin
[    7.291696] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_sdma.bin
[    7.293149] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_sdma1.bin
[    7.295317] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_uvd.bin
[    7.298491] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_vce.bin
[    7.301323] amdgpu 0000:01:00.0: firmware: direct-loading firmware amdgpu/polaris11_k_smc.bin
[    7.609968] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:01:00.0 on minor 1

In the meantime, I ran a cl-demo from here https://wiki.tiker.net/OpenCLHowTo#Using_the_AMD_ICD_with_GPUs

#./cl-demo 9999999 2000
Kaveri:
0.004502 s
26.654542 GB/s
GOOD

RX550:
0.001654 s
72.572810 GB/s
GOOD

The result difference is stable(did 10 times), so for me it looks like OpenCL is working with both devices.

valeriob01 commented 5 years ago

That's tricky, your gpus are initialized on numbers 0 and 1, that is good for gpuowl, -device 0 and -device 1. But for mfakto the numbers start at 1, so 1 is the first gpu kavery, and 2 is the RX.

c-bug commented 5 years ago

I have no idea what happend and why, but let me tell you...

After setting -O0 in mfakto.ini OCLCompileOptions=-I. -O0 -DVECTOR_SIZE=2 -DGCN -DMORE_CLASSES -DCL_GPU_SIEVE

Error message is gone and mfakto starts working ...

-O0 resulted now in 3 GHz-d/day. Which is very low. I stopped mfakto and got a hanging "clinfo" for some "timeout" time. While it hung, dmesg reported:

[  942.968023] amdgpu 0000:01:00.0: GPU fault detected: 147 0x00087702 for process  pid 0 thread  pid 0
[  942.968070] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100C01
[  942.968108] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C077002
[  942.968148] amdgpu 0000:01:00.0: VM fault (0x02, vmid 6, pasid 32770) at page 1051649, read from 'SDM0' (0x53444d30) (119)

After it hung, I started mfakto again and got 141 GHz-d/day on the RX550.

Ok, currently I am running both GPUs Kaveri 56GHz-d/day, RX550 147GHz-d/day

I also do have it now running with -d 1(kaveri) and -d 2(rx550) - nevertheless -d 0 is still existing showing the Baffin.

valeriob01 commented 5 years ago

There may be some issue with the driver, but the most important part is that you got it working. Good number crunching!

valeriob01 commented 5 years ago

BTW, there is a patched version which compiles on Linux, I use this version: https://github.com/preda/mfakto

c-bug commented 5 years ago

Oh I solved the compiling troubles myself, adding AMD_APP_DIR = /usr in src/Makefile under the original variable.

In the meantime I unlocked 2 Compute Units and a few shaders using SBPolarisEditor. The card runs with 175GHz-d/day now.

valeriob01 commented 5 years ago

I said it in the forum already but just in case. To optimize mfakto performance: mfakto -d \ --perftest 1 It will write new parameters in the mfakto.ini file so you must be in its own directory.

c-bug commented 5 years ago

Nice I give it a try asap.

proski commented 1 week ago

This works for me:

-d 11 for the first device on the first platform -d 12 for the second device on the first platform -d 21 for the first device on the second platform -d 22 for the second device on the second platform

For instance, I have intel-opencl, mesa with rusticl and pocl installed. -d 11 selects intel-opencl, -d 21 selects rusticl (if RUSTICL_ENABLE=iris is set in the environment), -d 31 selects pocl.