EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 12 forks source link

request to add parallel run in structureRun #58

Open fernandabiologia opened 2 years ago

fernandabiologia commented 2 years ago

Hi Eric,

Thanks for the great package. I've been using strataG for some time now, and I recently had to run a STRUCTURE analysis with a large sample size which was very slow.

I was wondering if would be possible to add parallel processing in the function structureRun to speed up? In that way the user could provide to an argument how many cores to use.

Thank you.

EricArcher commented 2 years ago

STRUCTURE doesn't make use of multiple cores internally, so I can't take advantage of that capability in the structureRun() wrapper. The most I could do is convert it to optionally run each value of K on a separate core. Since files are currently being written to the working directory, I need to make sure there won't be conflicts with filenames among the cores (I think the seed file doesn't currently get a special name with the user-specified label at the moment). It'll give me the opportunity to also add the option of writing to (and reading from) a temporary folder as a default which I've wanted to put in place for a while. Unfortunately, I'm pretty busy with other work at the moment, so I'll have to put this on my future To-Do list.

However, you should be able to create this functionality yourself with something like (untested) :

sr.list <- parallel::mclapply(
  2:5, 
  function(k) structureRun(g, k = k, ...),
  mc.cores = 5
})