Implemented a different version of the multidog parallel loop

Hi David,

I contacted you some time ago (November last year) suggesting a different approach to multidog, writing to files instead of outputting a data.frame. I see that since then you've changed to the future package for the parallelization management. I have not implemented the algorith using future but I imagine it will work the same.

My test on 100K SNPs shows the following time usage: Function athe_small took 1.66 h Function athe_all took 1.94 h Function multidog took 3.15 h

Where athe_small is multidog writing only the snp parameters (thinkgs like prop_mis that have one estimate per marker) and the genotypes; athe_all that writes all possible outputs in different tables; and multidog which is the original implementation.

You see that the efficiency improvement on time is relatively small. I suspect memory usage should be better, as that's what I found when doing it on my own computer, although I couldn't confirm it in the computer cluster where I performed the test above (reading memory usage turns out to be more complicated than I anticipated).

Small overview of the function changes:

Output is written into multiple tables instead of returned as a data.frame. This is achieved by parallel writing into one file in groups of 100 markers. This creates a few corrupted lines (~0.2% of lines, ~0.4% of markers) that are eliminated. Writing into files instead of storing in memory the results while the loop is going improves memory usage substantially (should test anew).
User can define desired output, less output equals faster computation time.
Multidog object class and multidog plots are not available anymore. A "multidog builder" could be implemented so that based on the tables a multidog object can be created, which would allow the usage of plot_multidog().
The "future" package has not been used for parallelization.

Let's see what you think.

Cheers, Alejandro

PS: Sorry for the delay with submitting, some other research got in the way.

dcgerard / updog

Implemented a different version of the multidog parallel loop #17