csi-greifflab / Absolut

Unconstrained lattice antibody-antigen bindings generator
93 stars 18 forks source link

Is it possifble to change the receptorLigand::generateReceptors to a multi-thread version #20

Open semal opened 1 year ago

semal commented 1 year ago

Below is some response from gpt4 maybe helpful, but I can not sure it works.

To make the selected code multi-threaded, we can use the pthread library to create multiple threads that can execute the for loop in parallel.

First, we need to declare a function that will be executed by each thread. This function should take a void argument and return a void value. In our case, we can define a function generateReceptorsThread that takes a pointer to a set object as its argument and returns NULL. This function will execute the for loop in the original code, but only for the positions in the given set.

void* generateReceptorsThread(void* arg){
    set<int>* positions = (set<int>*) arg;
    std::set<int>::iterator it;
    for (it = positions->begin(); it != positions->end(); ++it)
    {
        // original code here
    }
    return NULL;
}

Next, we need to create multiple threads and pass the appropriate arguments to each thread. We can use the pthread_create function to create a new thread and pass the set of positions as its argument. We can store the thread IDs in an array so that we can join them later.

const int NUM_THREADS = 4; // number of threads to create
pthread_t threads[NUM_THREADS];
set<int> startingPositions = neighborPositions(ligand, forbiddenVolume);
int nR = 0;
int threadIndex = 0;
std::set<int>::iterator it;
size_t counter = 0;
int cptPos = 0;
cout << "   -> Start computing structures from each bordering position. Nr of bordering/Starting positions: " << startingPositions.size() << endl;
for (it = startingPositions.begin(); it != startingPositions.end(); ++it)
{
    if(DBGgenRecept) cout << " ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----" << endl;
    cptPos++;
    int pos = *it;
    if(!contains(forbiddenVolume, pos)){

        // either the protein starts by an interaction, so starts here. Only one direction
        for(int d1 = 0; d1 < Nb_Moves_Absolute; ++d1){
            int neighbor1 = position(pos, (moveDirection) d1);
            if((!contains(ligand.occupiedPositions, neighbor1)) && (!contains(forbiddenVolume, neighbor1))){

                vector<struct3D*> toAdd = generateProteins(sizeReceptor, pos, (moveDirection) d1, minimalNInteract - 1, false, ligand, forbiddenVolume);
                size_t NTA = toAdd.size();
                for(size_t i = 0; i < NTA; ++i){
                    possibleReceptors.push_back(toAdd[i]);
                    nR++;
                }
                if(DBGgenRecept) cout << nR << "\tPosition " << printVector(lattice::positionFromID(pos));
                if(DBGgenRecept) cout << " Dir " << intToMoveChar(d1) << ", " << NTA << " sequences starting from an interaction there" << endl;
            }
        }

        // for each couple of possible directions ...
        vector<std::pair<moveDirection, moveDirection> > possibleDirections;
        for(int d1 = 0; d1 < Nb_Moves_Absolute; ++d1){
            for(int d2 = d1+1; d2 < Nb_Moves_Absolute; ++d2){// the two moves should be different.

                // conditions on the two directions :
                // direction 1 should not touch again the protein,
                //           and the following structure should not touch the protein either
                // direction 2 can touch again.
                // if dir1 and dir2 don't touch, explore in both orders.
                int neighbor1 = position(pos, (moveDirection) d1);
                int neighbor2 = position(pos, (moveDirection) d2);

                // if both directions are free
                if((!contains(ligand.occupiedPositions, neighbor1)) &&
                        (!contains(ligand.occupiedPositions, neighbor2)) && (!contains(forbiddenVolume, neighbor1)) && (!contains(forbiddenVolume, neighbor2))){
                    // condition : first direction should not touch. If both do so, both order possible.
                    if(!touch(ligand, neighbor1)){
                        possibleDirections.push_back(std::pair<moveDirection, moveDirection>((moveDirection) d1, (moveDirection) d2));
                    }
                    if(!touch(ligand, neighbor2)){
                        possibleDirections.push_back(std::pair<moveDirection, moveDirection>((moveDirection) d2, (moveDirection) d1));
                    }
                }
            }
        }
        size_t nDir = possibleDirections.size();
        for(size_t i = 0; i < nDir; ++i){
            // create a new thread and pass the set of positions as its argument
            pthread_create(&threads[threadIndex], NULL, generateReceptorsThread, (void*) &possibleDirections[i]);
            threadIndex++;
            if(threadIndex == NUM_THREADS){
                // join all threads before creating new ones
                for(int j = 0; j < NUM_THREADS; j++){
                    pthread_join(threads[j], NULL);
                }
                threadIndex = 0;
            }
        }
    }
}
// join any remaining threads
for(int j = 0; j < threadIndex; j++){
    pthread_join(threads[j], NULL);
}

In the modified code, we create a fixed number of threads (NUM_THREADS) and join them after processing a fixed number of positions. This helps to limit the number of threads created and reduces the overhead of thread creation and destruction.

Note that we need to include the pthread.h header file and link against the pthread library to use the pthread functions.

petouzeman commented 1 year ago

Hi!

So, this is a nice idea, I actually didn't do it on purpose because they are recursive calls (inside generateProteins) that need to use the same shared memory that stores previous recursive results, so I can of expected a mess if one recursive call in a thread wanted the stored results of another call in another thread, which should happen, (this is actually recusrive + dynamical programming since it stores previous computations). Overall since we do this step once per PDB/antigen, I didn't push to make it more efficient.

But your idea is great and might work, I am just not sure in term of practical efficiency, or it might require tricks to share memorz between the threads and make locks in this common memory. my feeling is that a multithreaded version like this will have separate memory and perform the same recursive calls for each thread separately without using the power of saving the already calculated results from the other threads. The only way to know would be to run it! I am in holidays and changing job now, but in case you tried and it worked, I would be impressed!

I realized I might not have been clear :) feel free to ask more Best, Philippe