exaloop / codon

A high-performance, zero-overhead, extensible Python compiler using LLVM
https://docs.exaloop.io/codon
Other
15.01k stars 517 forks source link

Executing multiple functions in multi-threading #431

Open iSunfield opened 1 year ago

iSunfield commented 1 year ago

Codon is not blocked by Python's GIL, so I expect good performance in executing multi-threading. I've tried out code to execute multiple class functions in multi-threading, as shown below. While it works well when running a single function in multi-threading, but it doesn't work properly when trying to run multiple functions. If there is a way to run multiple functions in multi-threading using Codon, please let me know

import openmp as omp

class Test1:
    def Func(self,No:int):
        for i in range(300000000):
            if i % 100000000 == 0:
                print('Func1',No,i)

class Test2:
    def Func(self,No:int):
        for i in range(300000000):
            if i % 100000000 == 0:
                print('Func2',No,i)

T1 = Test1()
T2 = Test2()
Threads = (T1,T2)

#-- It worked properly in multi-threading. 
@par
for i in range(4):
    Threads[0].Func(i)

#-- The functions Func in Test1 and Test2 do not work in multi-threading. 
@par
for i in range(4):
    Threads[0].Func(i)
    Threads[1].Func(i)

#-- error: expected iterable expression
@par
for i in range(4):
    Threads[i % 2].Func(i)
inumanag commented 1 year ago

Hi @iSunfield:

  1. Threads is tuple and can only be indexed with a static integer (i is not static). You can try this for now:

    @par
    for i in range(4):
    if i % 2:
        Threads[1].Func(i)
    else:
        Threads[0].Func(i)
  2. The output I get for the second part is:

    Func1 1 100000000
    Func1 0 100000000
    Func1 3 100000000
    Func1 2 100000000
    Func1 1 200000000
    Func1 3 200000000
    Func1 0 200000000
    Func1 2 200000000
    Func2 1 0
    Func2 3 0
    Func2 0 0
    Func2 2 0
    Func2 1 100000000
    Func2 3 100000000
    Func2 0 100000000
    Func2 2 100000000
    Func2 3 200000000
    Func2 1 200000000
    Func2 0 200000000
    Func2 2 200000000

Is this what you expect? These functions are indeed invoked in parallel (unless I misunderstood the question).

iSunfield commented 1 year ago

Thank you for your kind support, Inumanag-san. I was able to execute a class function with multithreading, which enables me to achieve what I wanted to do. I understood that codon is erquired static code for @par, I think It is resonable. For fexibility code implementation, I tried the following method: implementing a function in C++ that executes a class function with multithreading and then calling it. However, I gave up because I couldn't convert the pointer to the class function of codon into a C++ vertual function type . Could you please inform me if there is a way to call the class function of coson from C++ by creating pointer of codon function as C++ virtual function?

// THreadTest.cpp file
//  Creat so file
// g++ -shared -fPIC -o ThreadTest.so ThreadTest.cpp

#include <thread>
#include <vector>
#include <stdio.h>

//--- Base class for thread function ---
class Base{
public:
    virtual void func1(int x) {}
    virtual void func2(int x) {}
};

struct ThreadParam {
    Base* object;
    void (Base::*method)(int);
    int args;
};

//--- Export function for codon
extern "C" {

    void ThreadExe(ThreadParam* data, int size){
        std::vector<ThreadParam> ThreadFuncs(data, data + size);
        std::vector<std::thread> Threads;

           //--- Show table for Debug 
//      unsigned char* P = (unsigned char*)(data);
//      for (int j=0;j<size;j++){
//          for (int i=0;i<8*4;i++){
//              printf("%x,",*P++);
//          } 
//          printf("\n");
//      }

        //-- create thread
        for (auto& Param : ThreadFuncs) {
            Threads.push_back(std::thread(Param.method,Param.object,Param.args));
        }

        //-- execute thread
        for (auto& Th : Threads) {
            Th.join();
        }

    }
}

//------------------------------------------------------
//-    Below is test module for ThreadExe function
//------------------------------------------------------
class MyClass1:public Base{
public:
    void func1(int x) override{
        int Count=0;
        for(int i=0;i<1000000000;i++){
            if(i % 100000000 == 0){
                printf("MyClass1_1:%d Count:%d\n",x,Count++);
            }
        }
    }
    void func2(int x) override{
        int Count=0;
        for(int i=0;i<1000000000;i++){
            if(i % 100000000 == 0){
                printf("MyClass1_2:%d Count:%d\n",x+1,Count++);
            }
        }
    }
};

class MyClass2:public Base{
public:
    void func1(int x) override{
        int Count=0;
        for(int i=0;i<1000000000;i++){
            if(i % 100000000 == 0){
                printf("MyClass2_1:%d Count:%d\n",x,Count++);
            }
        }
    }
    void func2(int x) override{
        int Count=0;
        for(int i=0;i<1000000000;i++){
            if(i % 100000000 == 0){
                printf("MyClass2_2:%d Count:%d\n",x+2,Count++);
            }
        }
    }
};

int main() {
    MyClass1  MyClassObj1;
    MyClass2  MyClassObj2;
    ThreadParam ThreadEach;

    std::vector<ThreadParam> ThreadP;
    std::vector<std::thread> threads;

    // Create a new thread table
    ThreadEach.object = (Base*)(&MyClassObj1);
    ThreadEach.method = &Base::func1;
    ThreadEach.args = 1;
    ThreadP.push_back(ThreadEach);
    ThreadEach.object = (Base*)(&MyClassObj2);
    ThreadEach.method = &Base::func1;
    ThreadEach.args = 2;
    ThreadP.push_back(ThreadEach);

    for(int i=0;i<ThreadP.size();i++){
        printf("Obj:%p, Method size:%ld, Args:%d\n",(void*)(ThreadP[i].object),sizeof(ThreadP[i].method),ThreadP[i].args);
   }

    //-- Multi threading 
    ThreadExe(ThreadP.data() , ThreadP.size());

     return 0;
}

// --- Codon code ----
LIBRARY = "./ThreadTest.so"
from C import LIBRARY.ThreadExe(cobj, int) -> None

class Base:
    Buffer : Array[UInt[64]]

    def __init__(self,BufferSize:int):
        self.Buffer = Array[UInt[64]](BufferSize)

    def Func(self,No:int):
        pass

class Test1(Base):
    def __init__(self,BufferSize:int):
        super().__init__(BufferSize)

    def Func(self,No:int):
        for i in range(300000000):
            if i % 100000000 == 0:
                print('Func1',No,i)

class Test2(Base):
    def __init__(self,BufferSize:int):
        super().__init__(BufferSize)

    def Func(self,No:int):
        for i in range(300000000):
            if i % 100000000 == 0:
                print('Func2',No,i)

def Exe(ThNo:int,*Args):
    if ThNo == 0:
        Args[0].Func(1)
    elif ThNo == 1:
        Args[1].Func(2)

@tuple
class ThreadFunc:
    Obj     : cobj
    FuncAdr1 : cobj
    FuncAdr2 : cobj
    Args1   : int

    def __new__(Objct:cobj,FuncA1:cobj,FuncA2:cobj,Arg:cobj):
        pass        

FuncList = Array[ThreadFunc](20)

N = 17

T1 = Test1(2**N)
T2 = Test2(2**N)

F1 = T1.Func
F2 = T2.Func

#--- Let's I know how to covert the codon pointer of the object
#    and its class function to C++ type ----
FuncList[0] = ThreadFunc(__ptr__(T1).as_byte(),
                         __ptr__(F1).as_byte(),
                         __ptr__(F1).as_byte()+8,
                         1)

FuncList[1] = ThreadFunc(__ptr__(T2).as_byte(),
                         __ptr__(F2).as_byte(),
                         __ptr__(F2).as_byte()+8,
                         2)

ThreadExe(FuncList.ptr.as_byte(), 2)