Closed Saikatsaha1996 closed 1 year ago
What do you need help with? Optimisation for Mali GPUs?
What do you need help with? Optimisation for Mali GPUs?
No sir 😃.. i don't want performance.. I just like to test ..
My problem:
I completed xmrig-amd miner to Android.. algo lyra2-webchain
I edited many codes than i get success for fond ocl platform..
But my bad luck when i run the miner.. Getting error with kernel source.. My device's driver OpenCL 2.0 & 3.0 ..
My error is :
What's the meaning of outermost scope
error: variables in the local address space can only be declared in the outermost scope of a kernel function
Kernel Code .. from line = (278)
__local unsigned long tempWordRowIn[48];
https://github.com/Saikatsaha1996/webchain/blob/master/src/amd/opencl/lyra2.cl
Amd my all total commits for work in Android.. You can ignore my all commts
https://github.com/Saikatsaha1996/webchain/commit/1cf66c332dd3805c593a87e441a3c0811bccef83
Really help wanted for solve this error..
Thank you so much for your response 😊
https://registry.khronos.org/OpenCL/sdk/2.0/docs/man/xhtml/local.html - "Variables allocated in the __local address space inside a kernel function must occur at kernel function scope." Example:
// Examples of variables allocated in the __local address space
// inside a kernel function
kernel void my_func(...)
{
local float a; // A single float allocated
// in local address space
local float b[10]; // An array of 10 floats
// allocated in local address space.
if (...)
{
// example of variable in __local address space but not
// declared at __kernel function scope.
local float c; // not allowed.
}
}
if you look at the kernel you linked, the variables are inside an if statement which starts on line 228. As the example shows: this is not allowed.
The variables should be moved before the if statement? I would think that would work but I'm not very familiar with opencl shaders.
https://registry.khronos.org/OpenCL/sdk/2.0/docs/man/xhtml/local.html - "Variables allocated in the __local address space inside a kernel function must occur at kernel function scope." Example:
// Examples of variables allocated in the __local address space // inside a kernel function kernel void my_func(...) { local float a; // A single float allocated // in local address space local float b[10]; // An array of 10 floats // allocated in local address space. if (...) { // example of variable in __local address space but not // declared at __kernel function scope. local float c; // not allowed. } }
if you look at the kernel you linked, the variables are inside an if statement which starts on line 228. As the example shows: this is not allowed.
No sir from line 278
Okay okay understood.. But how do I solve.. actually i never edit any kernel code.. if i do it will my fast time in my life
Yes, I said "the variables are inside an if statement" and "this is not allowed".
Yes, I said "the variables are inside an if statement" and "this is not allowed".
Okay i added codes to line 227
Now kernel compiled.. thank you 😃
But invalid job hight
error..
It can coming from kernel codes?
Found some documents here
What does the kernel look like now?
What does the kernel look like now?
__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void lyra2(__global unsigned long* Matrix,__global uchar* pwd,sph_s64 pwdlen, __global sph_u64* output, ulon>
uint gid = get_global_id(0);
sph_s64 kLen = 32;
int ROW_LEN_INT64 = BLOCK_LEN_INT64 * NCOLS; // 12 * 4 = 48
int ROW_LEN_BYTES = ROW_LEN_INT64 * 8; // 48 * 8 = 384
int BLOCK_LEN = BLOCK_LEN_BLAKE2_SAFE_INT64; // 8
int nBlocksInput = ((pwdlen + 6 * BLOCK_LEN_BLAKE2_SAFE_INT64) / BLOCK_LEN_BLAKE2_SAFE_BYTES) + 1; // 2
//============================= Basic variables ============================//
int row = 2; //index of row to be processed
int prev = 1; //index of prev (last row ever computed/modified)
int rowa = 0; //index of row* (a previous row, deterministically picked during Setup and randomly picked while Wand>
int tau; //Time Loop iterator
int step = 1; //Visitation step (used during Setup and Wandering phases)
int window = 2; //Visitation window (used to define which rows can be revisited during Setup)
int gap = 1; //Modifier to the step, assuming the values 1 or -1
//long i; //auxiliary iteration counter
//long v64; // 64bit var for memcpy
__local unsigned long tempWordRowIn[48];
__local unsigned long tempWordRowInOut[48];
__local unsigned long tempWordRowOut[48];
__global unsigned long* memPtrStart = memMatrix(0);
//==========================================================================/
if(gid < Threads)
{
//========== Initializing the Memory Matrix and pointers to it =============//
//Tries to allocate enough space for the whole memory matrix
//printf("Matrix Size :%i", NROWS *BLOCK_LEN_INT64 * NCOLS);
//unsigned long Matrix[NROWS *BLOCK_LEN_INT64 * NCOLS]; // 16384 * 12 * 4 = 786,432
for(int k = 0; k < (NROWS *ROW_LEN_BYTES); k++)
{
Matrix[k + (gid * LYRA2_MEMSIZE/8 )] = 0;
}
__global unsigned char* ptrByte = (__global unsigned char*) memPtrStart;
//Prepends the password
for (int j = 0; j < pwdlen; j++) {
ptrByte[j] = pwd[j];
}
//nonce
__global unsigned long* ptrLongNonce = (__global unsigned long*)(&ptrByte[pwdlen-8]);
ptrLongNonce[0] = nonce + (ulong)gid;
long length = pwdlen;
for (int j = length; j < nBlocksInput * BLOCK_LEN_BLAKE2_SAFE_BYTES - pwdlen+length; j++) {
ptrByte[j] = 0;
}
//Concatenates the basil: every integer passed as parameter, in the order they are provided by the interface
__global unsigned long* ptrLong = (__global unsigned long*)(&ptrByte[length]);
ptrLong[0] = kLen;
ptrLong[1] = pwdlen; // saltlen
ptrLong[2] = (unsigned long)0;
ptrLong[3] = TCOST;
ptrLong[4] = NROWS;
ptrLong[5] = NCOLS;
//Now comes the padding
ptrByte[length+48] = 0x80;
ptrByte[nBlocksInput * BLOCK_LEN_BLAKE2_SAFE_BYTES - 1] ^= 0x01;
//======================= Initializing the Sponge State ====================//
//Sponge state: 16 uint64_t, BLOCK_LEN_INT64 words of them for the bitrate (b) and the remainder for the capacity (>
unsigned long state[16];
init_sponge(state);
//set up pointers
__global unsigned long* ptrWordRowIn;
__global unsigned long* ptrWordRowInOut;
__global unsigned long* ptrWordRowOut;
// __local unsigned long tempWordRowIn[48];
// __local unsigned long tempWordRowInOut[48];
// __local unsigned long tempWordRowOut[48];
//================================ Setup Phase =============================//
//Absorbing salt, password and basil: this is the only place in which the block length is hard-coded to 512 bits
__global unsigned long* ptrSetUp = memPtrStart;
for(int i = 0; i < nBlocksInput; i++){
absorbBlockBlake2Safe(ptrSetUp);
ptrSetUp += BLOCK_LEN;
}
Seems fine but the block height is strange, maybe a problem with the mining pool?
Seems fine but the block height is strange, maybe a problem with the mining pool?
Can you check one time my changes ?
https://github.com/Saikatsaha1996/webchain/commit/1cf66c332dd3805c593a87e441a3c0811bccef83
Seems fine but the block height is strange, maybe a problem with the mining pool?
All pool is online.. 🥲
Hmm documents found from arm..
It's can cause issues, but how do I rectify..
#if defined(CL_VERSION_3_0)
typedef cl_command_queue (CL_API_CALL *createCommandQueueWithProperties_t)(cl_context, cl_device_id, const cl_queue_properties *, cl_int *);
#endif
typedef cl_command_queue (CL_API_CALL *createCommandQueue_t)(cl_context, cl_device_id, cl_command_queue_properties, cl_int *);
typedef cl_context (CL_API_CALL *createContext_t)(const cl_context_properties *, cl_uint, const cl_device_id *, void (CL_CALLBACK *pfn_notify)(const char *, const void *, size_>
typedef cl_int (CL_API_CALL *buildProgram_t)(cl_program, cl_uint, const cl_device_id *, const char *, void (CL_CALLBACK *pfn_notify)(cl_program, void *), void *);
typedef cl_int (CL_API_CALL *enqueueNDRangeKernel_t)(cl_command_queue, cl_kernel, cl_uint, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *>
typedef cl_int (CL_API_CALL *enqueueReadBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *); typedef cl_int (CL_API_CALL *enqueueWriteBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int (CL_API_CALL *finish_t)(cl_command_queue);
Seems fine but the block height is strange, maybe a problem with the mining pool?
Sir i found problem
[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueReadBuffer.
[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueReadBuffer to fetch results.
[2022-12-26 22:51:22] Error CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST when calling clEnqueueWriteBuffer to fill input buffer.
Can you get hints for solution..
What should I remove?
cl_int OclLib::enqueueNDRangeKernel(cl_command_queue command_queue, cl_kernel kernel, cl_uint work_dim, const size_t *global_work_offset, const size_t *global_work_size, const >
{
assert(pEnqueueNDRangeKernel != nullptr);
return pEnqueueNDRangeKernel(command_queue, kernel, work_dim, global_work_offset, global_work_size, local_work_size, num_events_in_wait_list, event_wait_list, event);
}
cl_int OclLib::enqueueReadBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read, size_t offset, size_t size, void *ptr, cl_uint num_events_in_wait_list, c>
{
assert(pEnqueueReadBuffer != nullptr);
const cl_int ret = pEnqueueReadBuffer(command_queue, buffer, blocking_read, offset, size, ptr, num_events_in_wait_list, event_wait_list, event);
if (ret != CL_SUCCESS) {
LOG_ERR(kErrorTemplate, OclError::toString(ret), kEnqueueReadBuffer);
}
return ret;
}
cl_int OclLib::enqueueWriteBuffer(cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write, size_t offset, size_t size, const void *ptr, cl_uint num_events_in_wait>
{
assert(pEnqueueWriteBuffer != nullptr);
return pEnqueueWriteBuffer(command_queue, buffer, blocking_write, offset, size, ptr, num_events_in_wait_list, event_wait_list, event);
}
cl_int OclLib::finish(cl_command_queue command_queue)
{
assert(pFinish != nullptr);
return pFinish(command_queue);
}
I looked through some of your changes, does compiling through termux define __ANDROID__
? Was the bump in opencl version necessary?
Okay sir i will trying..
@Pyogenics hello sir help wanted R you remember me..