Efficient options for parallel processing

takluyver commented 6 years ago

Dmitry mentioned that parallel processing is more efficient if each worker can operate on a separate sequence file. Our current interfaces completely hide the notion of sequences to make it simpler to access data. We should think about how to support efficient parallelism.

marcelotrevisani commented 6 years ago

Do you mean to use that to run in clusters (like using MPI) or just in general?

takluyver commented 6 years ago

Both, if possible. For things which are limited by reading the files, splitting tasks across a cluster may be better than multiple cores on one machine (though it depends on the network filesystem).

I think there are two situations for parallelising processing of detector data: computations which can work on separate detector modules, like finding peaks, and computations which need detector images assembled. Because each module is recorded to a separate sequence of files, the former could be more easily split up into a large number of jobs.

European-XFEL / karabo_data

Efficient options for parallel processing #49