LiberTEM / LiberTEM

Open pixelated STEM framework
https://libertem.github.io/LiberTEM/
GNU General Public License v3.0
109 stars 68 forks source link

Remove deprecated Job functionality #549

Closed uellue closed 3 years ago

uellue commented 4 years ago

All currently implemented Job functionality can, in principle, be expressed through the UDF API. By only having one interface we can reduce the complexity of LiberTEM. Furthermore, the ApplyMasksJob functionality can be useful as a base class and example for other UDFs, and dtype support should be added to UDFs in any case.

Steps:

uellue commented 4 years ago

In a quick test, the UDF version is significantly faster than the Job version. As a next step, perhaps we should re-run the big cluster benchmark to see if we still hit the previous numbers or even exceed them?

image

image

uellue commented 4 years ago

Testing in the GUI, it feels as if the JOB version is somewhat faster (10-20 %) for single mask analysis for MIB and K2. This should be verified under more controlled test conditions.

uellue commented 4 years ago

As far as I could find examples, our code base uses picking to get whole frames always. The slicing capability of the PickFrameJob is only used for demonstration purposes.

In the future, we may want to include more advanced loading capabilities for the advanced GUI #134. This should probably be co-developed with the GUI and may include scaling and rotation.

Since it is now relatively straightforward for both users and developers to implement a custom sampling UDF should the need arise, I'd propose to only implement a very basic loading UDF that loads all data within a ROI. That should be sufficient to support the current use cases.

We can then extend the capabilities as soon as we have use cases that require them.

uellue commented 4 years ago

Regarding features that are available with Jobs, but not with UDFs:

Both Jobs and UDFs are limited to MapReduce style computations by running independent tasks on the worker nodes and then merging their results together into a NumPy array. They both specify the tasks on the worker nodes and a merge function on the central node, which the executor and API then execute.

The Job interface allows to define tasks rather arbitrarily. In particular, they are not necessarily linked to an existing dataset. The constructor does require a dataset parameter, but this could be overridden in a subclass.

UDFs, in contrast, are currently always run on a dataset and receive its data. Issues #125, #288 and #289 call for UDFs that run without a dataset, in particular simulations.

Furthermore, UDFs seem not to use memory mapping since they don't use the mmap parameter of partition.get_tiles:

https://github.com/LiberTEM/LiberTEM/blob/c9fc491ec2eaaaa3c2d339c56ab7eb5ecbaee66a/src/libertem/udf/base.py#L587-L592

https://github.com/LiberTEM/LiberTEM/blob/c9fc491ec2eaaaa3c2d339c56ab7eb5ecbaee66a/src/libertem/io/dataset/base.py#L300

uellue commented 4 years ago

Some test results:

As a side note, this would be an application for automated benchmarking #198

uellue commented 4 years ago

The MMAP change is now separated in #551 since it is unrelated to #550.

sk1p commented 4 years ago

Since it is now relatively straightforward for both users and developers to implement a custom sampling UDF should the need arise, I'd propose to only implement a very basic loading UDF that loads all data within a ROI. That should be sufficient to support the current use cases.

:+1:

Benchmarking, profiling and optimization to make sure the new system is at least on par performance-wise. --> This requires more systematic tests under controlled conditions. In particular the impact of mmap should be tested on madmax nodes since they are more sensitive to memory transfer bottlenecks than moellenstedt

Testing in the GUI, it feels as if the JOB version is somewhat faster (10-20 %) for single mask analysis for MIB and K2. This should be verified under more controlled test conditions.

Should we postpone further performance work until we have #198 fixed? Or were there any other blatant perf regressions?

I'll move the milestone to 0.7.0 as that's when we'll remove the old Job API.

uellue commented 4 years ago

Should we postpone further performance work until we have #198 fixed? Or were there any other blatant perf regressions?

Let's move it to later when we have metrics. The most blatant one was HDF5 picking, and our release tests will hopefully catch any other issues that really impact user experience.