HEXRD / hexrdgui

Qt6 PySide6 based GUI for the HEXRD library.
Other
28 stars 13 forks source link

Add support for serializing HEDM analysis #1100

Open cjh1 opened 2 years ago

cjh1 commented 2 years ago

This would include saving the analysis results in either "skinny" or "fat" format. The UI should detect if its a HEDM analysis a provide the user with the option to use each format. Note: The "fat" version can get large for lots of grains.

@joelvbernier

psavery commented 2 years ago

We'll have to think through a few things about this. Currently, all of the HEDM "outputs" are being saved on the IndexingRunner and FitGrainsRunner classes found here. A few things to do:

  1. Determine which attributes set on those objects (the output of fit_grains, for example) need to be serialized, and determine if there are any outputs that are not being saved that need to be serialized
  2. Figure out how to serialize those attributes
  3. Figure out how deserialization is going to work. For example, are we going to re-construct new IndexingRunner and FitGrainsRunner objects that have all of their attributes set (as in, make the objects identical to how they were at the time of serialization)? There may be something better to do that involves some refactoring, like instead of setting HEDM outputs as attributes on the IndexingRunner and FitGrainsRunner, having an HEDMState object that the IndexingRunner and FitGrainsRunner populate with values as the HEDM workflow progresses. Then, deserialization could just populate this HEDMState object with values, instead of having to reconstruct the IndexingRunner and FitGrainsRunner objects.
  4. For the fat format, we may not be able to keep in memory some of the data that is to be serialized. For example, we never keep the full pull_spots() output in memory, as it is too large. For serialization, we may end up re-running some computations like pull_spots() where we point it to the HDF5 file where it is to dump the fat output. Or, maybe we can indicate at the beginning of the HEDM workflow that we are serializing to a specific file, so that when the user runs through the workflow, all of the large data gets dumped immediately to that file, to avoid re-running the computations, and avoid keeping the data in memory.

Thoughts?

psavery commented 2 years ago

I am thinking to add two state classes as seen below. These contain some of the state that is currently being set on the runners. The runners will then set/get from the state objects, and we can add write/load functions to the state classes for serialization. We can also add more attributes to these classes for saving/loading as needed.

class IndexingState:
    def __init__(self):
        # This could be EtaOmeMaps or GenerateEtaOmeMaps
        self.ome_maps = None
        self.qfib = None
        self.completeness = None
        self.min_samples = None
        self.qbar = None
        # The output grains table
        self.grains_table = None

class FitGrainsState:
    def __init__(self):
        # Input grains table
        self.grains_table = None
        self.fit_grains_results = None
        self.result_grains_table = None