European-XFEL / karabo_data

Python tools to read and analyse data from European XFEL
https://karabo-data.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
13 stars 7 forks source link

Efficient options for parallel processing #49

Open takluyver opened 6 years ago

takluyver commented 6 years ago

Dmitry mentioned that parallel processing is more efficient if each worker can operate on a separate sequence file. Our current interfaces completely hide the notion of sequences to make it simpler to access data. We should think about how to support efficient parallelism.

marcelotrevisani commented 6 years ago

Do you mean to use that to run in clusters (like using MPI) or just in general?

takluyver commented 6 years ago

Both, if possible. For things which are limited by reading the files, splitting tasks across a cluster may be better than multiple cores on one machine (though it depends on the network filesystem).

I think there are two situations for parallelising processing of detector data: computations which can work on separate detector modules, like finding peaks, and computations which need detector images assembled. Because each module is recorded to a separate sequence of files, the former could be more easily split up into a large number of jobs.