Add a way to modify other scalars on dataset object

psavery commented 4 years ago

We can currently perform operations on all scalars like the following example, which inverts each of the scalars:

def transform(dataset):
    import numpy as np
    for name in dataset.scalars_names:
        scalars = dataset.scalars(name)
        min = np.amin(scalars)
        max = np.amax(scalars)
        scalars[:] = max - scalars + min

This modifies the scalars data in place.

However, this only works if the new scalars are the same shape as the old scalars, which may not always be the case (for instance, Bin Volume x2).

We may need to add some API to the dataset (both internal_dataset.py and external_dataset.py) so that scalars other than the active one can be modified more freely.

psavery commented 4 years ago

For this issue, I am thinking to add these two functions on the Dataset objects:

Dataset.create_empty_dataset()
Dataset.set_scalars(name, data)

The first one is similar to Dataset.create_child_dataset(), but Dataset.create_child_dataset() creates a deep copy of the parent. We don't want this for multiple scalars, because we may be changing the shape of the data on the dataset, and we want to enforce a rule that all scalars must have the same shape. The only attribute that is going to be copied from the parent in Dataset.create_empty_dataset() is the spacing. The first scalars set on the empty dataset will define the shape that will be enforced.

The second one just allows users to either over-write existing scalars or add new scalars to the dataset. After creating an empty dataset, Dataset.set_scalars() will accept data of any shape. The first scalars that is set will become the active scalars. If the dataset is not empty, and new scalars are added, the rule that all scalars must have the same shape will be enforced.

Let me know if a different design might be better.

psavery commented 4 years ago

Actually, I'm starting to question the behavior of Dataset.create_child_dataset() a little. Maybe it should already be doing what Dataset.create_empty_dataset() is intending to do.

In the internal pipeline, the structure of the parent is copied, but not a deep copy of the data. In the external pipeline, however, a deep copy is made. We might want to change the external dataset so that it does not make a deep copy, and then Dataset.create_empty_dataset() will not be needed.

psavery commented 4 years ago

Yeah, it looks like the external pipeline should probably not be performing a deep copy. We can change that, and then Dataset.create_empty_dataset() won't be needed.

OpenChemistry / tomviz

Add a way to modify other scalars on dataset object #2050