Background

For fit() of PLMs, we process chunks of units across all arrays one at the time. Each chunk loads all data, fits the model, and writes the results to output files (one per array) and some extra file.

Parallelization

Although each chunk can be read and fitted independently, it is not safe to store/write data independently/in parallel. There are two alternatives:

Write the output of the chunks to temporary files. Then, in the main process, when all chunks are done (or once in a while), read-collect these files and write results to the final output data files. Delete temporary files.
Since chunk output is typically smaller than input, we could run a few chunks in parallel and keep the results in memory and then write to file.

Both approaches are fairly easy (first one is easiest) using for loop, list environments and futures.

HenrikBengtsson / aroma.affymetrix

PARALLEL: PLM fit() and processing chunks in parallel #19

Background

Parallelization