Saikatsaha1996 / sgminer-arm

Adreno && ARM-Mali OpenCL GPU Miner for Android
GNU General Public License v3.0
2 stars 2 forks source link

Help wanted #1

Closed Saikatsaha1996 closed 1 year ago

Saikatsaha1996 commented 1 year ago

@Pyogenics hello sir help wanted R you remember me..

Pyogenics commented 1 year ago

What do you need help with? Optimisation for Mali GPUs?

Saikatsaha1996 commented 1 year ago

What do you need help with? Optimisation for Mali GPUs?

No sir 😃.. i don't want performance.. I just like to test ..

My problem: I completed xmrig-amd miner to Android.. algo lyra2-webchain

I edited many codes than i get success for fond ocl platform..

But my bad luck when i run the miner.. Getting error with kernel source.. My device's driver OpenCL 2.0 & 3.0 ..

My error is : What's the meaning of outermost scope

error: variables in the local address space can only be declared in the outermost scope of a kernel function

Kernel Code .. from line = (278) __local unsigned long tempWordRowIn[48];

https://github.com/Saikatsaha1996/webchain/blob/master/src/amd/opencl/lyra2.cl

IMG_20221226_041335

Amd my all total commits for work in Android.. You can ignore my all commts

https://github.com/Saikatsaha1996/webchain/commit/1cf66c332dd3805c593a87e441a3c0811bccef83

Really help wanted for solve this error..

Thank you so much for your response 😊

Pyogenics commented 1 year ago

https://registry.khronos.org/OpenCL/sdk/2.0/docs/man/xhtml/local.html - "Variables allocated in the __local address space inside a kernel function must occur at kernel function scope." Example:

// Examples of variables allocated in the __local address space
// inside a kernel function

kernel void my_func(...)
{
      local float      a;       // A single float allocated
                                // in local address space

      local float      b[10];   // An array of 10 floats
                                // allocated in local address space.
      if (...)
      {
          // example of variable in __local address space but not
          // declared at __kernel function scope.
          local float c;        // not allowed.
      }
}

if you look at the kernel you linked, the variables are inside an if statement which starts on line 228. As the example shows: this is not allowed.

Pyogenics commented 1 year ago

The variables should be moved before the if statement? I would think that would work but I'm not very familiar with opencl shaders.

Saikatsaha1996 commented 1 year ago

https://registry.khronos.org/OpenCL/sdk/2.0/docs/man/xhtml/local.html - "Variables allocated in the __local address space inside a kernel function must occur at kernel function scope." Example:

// Examples of variables allocated in the __local address space
// inside a kernel function

kernel void my_func(...)
{
      local float      a;       // A single float allocated
                                // in local address space

      local float      b[10];   // An array of 10 floats
                                // allocated in local address space.
      if (...)
      {
          // example of variable in __local address space but not
          // declared at __kernel function scope.
          local float c;        // not allowed.
      }
}

if you look at the kernel you linked, the variables are inside an if statement which starts on line 228. As the example shows: this is not allowed.

No sir from line 278

Okay okay understood.. But how do I solve.. actually i never edit any kernel code.. if i do it will my fast time in my life

IMG_20221226_033002

Pyogenics commented 1 year ago

Yes, I said "the variables are inside an if statement" and "this is not allowed".

Saikatsaha1996 commented 1 year ago

Yes, I said "the variables are inside an if statement" and "this is not allowed".

Okay i added codes to line 227

Now kernel compiled.. thank you 😃

But invalid job hight error..

It can coming from kernel codes?

Found some documents here

https://community.arm.com/support-forums/f/graphics-gaming-and-vr-forum/5246/why-it-apperars-an-cl_exec_status_error_for_events_in_wait_list-error-when-read-buffer-from-gpu-back-to-host-memory

IMG_20221226_222049

Pyogenics commented 1 year ago

What does the kernel look like now?

Saikatsaha1996 commented 1 year ago

What does the kernel look like now?

__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void lyra2(__global unsigned long* Matrix,__global uchar* pwd,sph_s64 pwdlen, __global sph_u64* output, ulon>
  uint gid = get_global_id(0);
  sph_s64 kLen = 32;
  int ROW_LEN_INT64 = BLOCK_LEN_INT64 * NCOLS; // 12 * 4 = 48
  int ROW_LEN_BYTES = ROW_LEN_INT64 * 8; // 48 * 8 = 384
  int BLOCK_LEN =  BLOCK_LEN_BLAKE2_SAFE_INT64; // 8
  int nBlocksInput = ((pwdlen + 6 * BLOCK_LEN_BLAKE2_SAFE_INT64) / BLOCK_LEN_BLAKE2_SAFE_BYTES) + 1; // 2
  //============================= Basic variables ============================//
  int row = 2; //index of row to be processed
  int prev = 1; //index of prev (last row ever computed/modified)
  int rowa = 0; //index of row* (a previous row, deterministically picked during Setup and randomly picked while Wand>
  int tau; //Time Loop iterator
  int step = 1; //Visitation step (used during Setup and Wandering phases)
  int window = 2; //Visitation window (used to define which rows can be revisited during Setup)
  int gap = 1; //Modifier to the step, assuming the values 1 or -1
  //long i; //auxiliary iteration counter
  //long v64; // 64bit var for memcpy

   __local unsigned long tempWordRowIn[48];
   __local unsigned long tempWordRowInOut[48];
   __local unsigned long tempWordRowOut[48];

     __global unsigned long* memPtrStart = memMatrix(0);
  //==========================================================================/
  if(gid < Threads)
  {
  //========== Initializing the Memory Matrix and pointers to it =============//
  //Tries to allocate enough space for the whole memory matrix
  //printf("Matrix Size :%i", NROWS *BLOCK_LEN_INT64  * NCOLS);
  //unsigned long Matrix[NROWS *BLOCK_LEN_INT64  * NCOLS]; // 16384 * 12 * 4 = 786,432
  for(int k = 0; k < (NROWS *ROW_LEN_BYTES); k++)
  {
     Matrix[k + (gid * LYRA2_MEMSIZE/8 )] = 0;
  }

  __global unsigned char* ptrByte = (__global unsigned char*) memPtrStart;
  //Prepends the password

  for (int j = 0; j < pwdlen; j++) {
     ptrByte[j] = pwd[j];
  }
  //nonce
  __global unsigned long* ptrLongNonce = (__global unsigned long*)(&ptrByte[pwdlen-8]);
  ptrLongNonce[0] = nonce + (ulong)gid;

  long length = pwdlen;
  for (int j = length; j < nBlocksInput * BLOCK_LEN_BLAKE2_SAFE_BYTES - pwdlen+length; j++) {
      ptrByte[j] = 0;
  }

  //Concatenates the basil: every integer passed as parameter, in the order they are provided by the interface
  __global unsigned long* ptrLong = (__global unsigned long*)(&ptrByte[length]);
  ptrLong[0] = kLen;
  ptrLong[1] = pwdlen; // saltlen
  ptrLong[2] = (unsigned long)0;
  ptrLong[3] = TCOST;
  ptrLong[4] = NROWS;
  ptrLong[5] = NCOLS;

  //Now comes the padding
  ptrByte[length+48] = 0x80;
  ptrByte[nBlocksInput * BLOCK_LEN_BLAKE2_SAFE_BYTES - 1] ^= 0x01;

  //======================= Initializing the Sponge State ====================//
  //Sponge state: 16 uint64_t, BLOCK_LEN_INT64 words of them for the bitrate (b) and the remainder for the capacity (>
  unsigned long state[16];
  init_sponge(state);

  //set up pointers
   __global unsigned long* ptrWordRowIn;
   __global unsigned long* ptrWordRowInOut;
   __global unsigned long* ptrWordRowOut;

//   __local unsigned long tempWordRowIn[48];
//   __local unsigned long tempWordRowInOut[48];
//   __local unsigned long tempWordRowOut[48];

  //================================ Setup Phase =============================//
  //Absorbing salt, password and basil: this is the only place in which the block length is hard-coded to 512 bits
  __global unsigned long* ptrSetUp = memPtrStart;
  for(int i = 0; i < nBlocksInput; i++){
     absorbBlockBlake2Safe(ptrSetUp);
         ptrSetUp += BLOCK_LEN;
  }
Pyogenics commented 1 year ago

Seems fine but the block height is strange, maybe a problem with the mining pool?

Saikatsaha1996 commented 1 year ago

Seems fine but the block height is strange, maybe a problem with the mining pool?

Can you check one time my changes ?

https://github.com/Saikatsaha1996/webchain/commit/1cf66c332dd3805c593a87e441a3c0811bccef83

Saikatsaha1996 commented 1 year ago

Seems fine but the block height is strange, maybe a problem with the mining pool?

All pool is online.. 🥲

Saikatsaha1996 commented 1 year ago

Hmm documents found from arm..

It's can cause issues, but how do I rectify..

https://community.arm.com/support-forums/f/graphics-gaming-and-vr-forum/5246/why-it-apperars-an-cl_exec_status_error_for_events_in_wait_list-error-when-read-buffer-from-gpu-back-to-host-memory

#if defined(CL_VERSION_3_0)
typedef cl_command_queue (CL_API_CALL *createCommandQueueWithProperties_t)(cl_context, cl_device_id, const cl_queue_properties *, cl_int *);
#endif

typedef cl_command_queue (CL_API_CALL *createCommandQueue_t)(cl_context, cl_device_id, cl_command_queue_properties, cl_int *);
typedef cl_context (CL_API_CALL *createContext_t)(const cl_context_properties *, cl_uint, const cl_device_id *, void (CL_CALLBACK *pfn_notify)(const char *, const void *, size_>
typedef cl_int (CL_API_CALL *buildProgram_t)(cl_program, cl_uint, const cl_device_id *, const char *, void (CL_CALLBACK *pfn_notify)(cl_program, void *), void *);
typedef cl_int (CL_API_CALL *enqueueNDRangeKernel_t)(cl_command_queue, cl_kernel, cl_uint, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *>
typedef cl_int (CL_API_CALL *enqueueReadBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);                             typedef cl_int (CL_API_CALL *enqueueWriteBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int (CL_API_CALL *finish_t)(cl_command_queue);
Saikatsaha1996 commented 1 year ago

Seems fine but the block height is strange, maybe a problem with the mining pool?

Sir i found problem

[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueReadBuffer.
[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueReadBuffer to fetch results.
[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueWriteBuffer to fill input buffer.

Can you get hints for solution..

What should I remove?

cl_int OclLib::enqueueNDRangeKernel(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const >
{
    assert(pEnqueueNDRangeKernel != nullptr);

    return pEnqueueNDRangeKernel(command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, event);
}

cl_int OclLib::enqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t size, void *ptr, cl_uint num_events_in_wait_list, c>
{
    assert(pEnqueueReadBuffer != nullptr);

    const cl_int ret = pEnqueueReadBuffer(command_queue, buffer, blocking_read, offset, size, ptr, num_events_in_wait_list, event_wait_list, event);
    if (ret != CL_SUCCESS) {
        LOG_ERR(kErrorTemplate, OclError::toString(ret), kEnqueueReadBuffer);
    }

    return ret;
}

cl_int OclLib::enqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t size, const void *ptr, cl_uint num_events_in_wait>
{
    assert(pEnqueueWriteBuffer != nullptr);

    return pEnqueueWriteBuffer(command_queue, buffer, blocking_write, offset, size, ptr, num_events_in_wait_list, event_wait_list, event);
}

cl_int OclLib::finish(cl_command_queue command_queue)
{
    assert(pFinish != nullptr);

    return pFinish(command_queue);
}
Pyogenics commented 1 year ago

I looked through some of your changes, does compiling through termux define __ANDROID__? Was the bump in opencl version necessary?

Saikatsaha1996 commented 1 year ago

Okay sir i will trying..