Open ssb22 opened 3 years ago
Something like this might help with the implementation of #pragma omp parallel for schedule(dynamic)
in the genome scan: https://stackoverflow.com/a/31885029 that would allow us to wait for the first of N threads to finish, which would then allow us to start a new thread when one of the N has finished (a little more overhead than assigning new work to an already-running thread, but negligible in this case).
Without doubts, The C version is faster :), even when we will implement parallelisation in The Java version. I would like to see this option implemented for sure, I guess you already covered most of the details on your comments. I will start to make this implementation and hope to have it soon. Many thanks for your help.
PrimerPooler is supposed to be fast, and part of its speed comes from parallelisation. The Java port is currently not parallelised, so will be slower than the C version (and I'd like users to know if the C version is many times faster☺)
Java does not have OpenMP, but we may be able to port our use of it into standard Java threads.
For
dGandScoreCounts
anddGprintBonds
, in each case the#pragma omp parallel
simply means "find out how many cores the CPU has (I think it'sRuntime.getRuntime().availableProcessors()
in Java) and start that number of threads, each of them running a copy of this block". So to do that in Java we'd have to put the block into apublic void run()
method of a class that 'extends Thread, then create N instances of that class, call
.start()on each (so they all start), then call
.join()` on each (so we wait till they've all finished).Each thread will want to know what its thread number is, so we'd best pass the loop counter in to the constructor and save it somewhere so we can pass it in to the call to
Triangle.t_iBounds(np)
(which needs to know thread number and total number of threads, like "we are thread number 3 of 4", so it can calculate a sensible share of the run for itself: the code to do this is already ported, we just need to set itsnThreads
andtNum
to something other than 1 and 0).And then there is the matter of the
#pragma omp critical
section, which is equivalent tosynchronized
in Java. So that block would have to go into a method ofAllPrimers
(or some other class of which there is only one instance), declaredsynchronized
so that only one thread at a time can run it (with parametersbucket
andscore
), and thecounts
,minScore
andmaxScore
arrays will also have to be looked after by that object. Similarly fordGprintBonds
which needs to synchronize access tosr
in the same way.poolsplit_thread
inSplitter.java
can also be parallelised in this way, but beware it has multiple critical sections (and each thread needs its own copy of all parameters, rather than having them all shared in theSplitter
object). Note also the use ofThreadRand
which must return a different random sequence to each thread (otherwise there's no point in parallelising here); this is also why the C version does not parallelise here ifseedless
is set.The most difficult part will be the parallelisation of the genome scan. In this case, we cannot rely on "share the chromosomes equally between the threads" being a good strategy (we don't know these chromosomes are the same lengths, they might be very different). So the C version uses
#pragma omp parallel for schedule(dynamic)
which basically means "create N threads (one per core), give one chromosome to each thread, and then whenever any thread finishes, give it another chromosome, until there are no chromosomes left, then wait for all threads to finish".