Create mixins for common methods

DaniBodor commented 7 months ago

Make mix-in classes for:

[x] slicing: #132
[ ] concatenating/adding
[ ] __len__
[x] __eq__ and check_equivalence: #133
- keep in mind that np.nan != np.nan (but should not fail equality) and that NDarrays need to be checked for shape and all entries.

The more I think about this, the more I feel like it would make sense to have a super-class with all these methods, rather than a whole bunch of mixins. To me, mixins make sense if we want to pick and choose which class uses which of these functionalities, but it seems to me that all classes will use all the methods. Anyway, for now I will create them as mixins and then it should be fairly straightforward to refactor it into a superclass if we decide to go that route.

psomhorst commented 7 months ago

@DaniBodor In different variants, data is stored in differently named attributes. E.g. EITDataVariant has an attribute pixel_impedance that stores data, while a ContinuousDataVariant has an attribute values that stores data. For concatenation it is important to be able to address the data attribute. There are a few ways I can think of to address this.

Unify the name. For most Variants, values is the logical name. For EITDataVariant, this is different. Both pixel_impedance and global_impedance are often used. It would not be clear what data values would hold. When initialising EITDataVariant, it is more clear to explicitly state what data you're passing on (pixel impedance).
Assume the fourth field of the dataclass is the data storing attribute. Variant has three fields, so any subclass can implement the fourth as the data storage. This currently works for EITDataVariant and ContinuousDataVariant, but is not a strong assumption to be making.
Add an attribute that points to the name of the data storage attribute. E.g. EITDataVariant._data_storage = "pixel_impedance". This requires no assumptions, but is a bit ugly to use: getattr(variant, variant._data_storage).
Add a property _data_storage that returns the proper attribute. See example below. It's better to use than the previous one: variant._data_storage.

class EITDataVariant(Variant):
    pixel_impedance: NDArray

    @property
    def _data_storage(self) -> NDarray:
        return self.pixel_impedance

What do you think? I think the last option is the neatest.

psomhorst commented 7 months ago

I think __len__() can be added to the mixin list. The length (number of time steps) of each dataset should be the length of the first axis, e.g. the 0th element of the shape. E.g.:

class EITDataVariant:
    def __len__(self) -> int:
        return self.pixel_impedance.shape[0]

This requires a uniform way to access the proper attribute; see previous comment.

DaniBodor commented 7 months ago

@DaniBodor In different variants, data is stored in differently named attributes. E.g. EITDataVariant has an attribute pixel_impedance that stores data, while a ContinuousDataVariant has an attribute values that stores data. For concatenation it is important to be able to address the data attribute. There are a few ways I can think of to address this.

* Unify the name. For most Variants, `values` is the logical name. For EITDataVariant, this is different. Both `pixel_impedance` and `global_impedance` are often used. It would not be clear what data `values` would hold. When initialising EITDataVariant, it is more clear to explicitly state what data you're passing on (pixel impedance).

* Assume the fourth field of the dataclass is the data storing  attribute. Variant has three fields, so any subclass can implement the fourth as the data storage. This currently works for EITDataVariant and ContinuousDataVariant, but is not a strong assumption to be making.

* Add an attribute that points to the name of the data storage attribute. E.g. `EITDataVariant._data_storage = "pixel_impedance"`. This requires no assumptions, but is a bit ugly to use: `getattr(variant, variant._data_storage)`.

* Add a property `_data_storage` that returns the proper attribute. See example below. It's better to use than the previous one: `variant._data_storage`.

class EITDataVariant(Variant):
    pixel_impedance: NDArray

    @property
    def _data_storage(self) -> NDarray:
        return self.pixel_impedance

What do you think? I think the last option is the neatest.

Thanks for sharing these considerations. I'm not sure what I prefer at this point, as it's still a bit abstract for me. I will think about it and play with it as I start implementing it and get back to you with my thoughts then.

DaniBodor commented 7 months ago

@DaniBodor In different variants, data is stored in differently named attributes. E.g. EITDataVariant has an attribute pixel_impedance that stores data, while a ContinuousDataVariant has an attribute values that stores data. For concatenation it is important to be able to address the data attribute. There are a few ways I can think of to address this.

Unify the name. For most Variants, values is the logical name. For EITDataVariant, this is different. Both pixel_impedance and global_impedance are often used. It would not be clear what data values would hold. When initialising EITDataVariant, it is more clear to explicitly state what data you're passing on (pixel impedance).

Assume the fourth field of the dataclass is the data storing attribute. Variant has three fields, so any subclass can implement the fourth as the data storage. This currently works for EITDataVariant and ContinuousDataVariant, but is not a strong assumption to be making.

Add an attribute that points to the name of the data storage attribute. E.g. EITDataVariant._data_storage = "pixel_impedance". This requires no assumptions, but is a bit ugly to use: getattr(variant, variant._data_storage).

Add a property _data_storage that returns the proper attribute. See example below. It's better to use than the previous one: variant._data_storage.
class EITDataVariant(Variant):
    pixel_impedance: NDArray

    @property
    def _data_storage(self) -> NDarray:
        return self.pixel_impedance
What do you think? I think the last option is the neatest.

OK, I now have a better understanding of the overall structure. Do I understand correctly that any Variant (bet it EITData, ContinuousData, or SparseData) will only have 1 attribute that should be concatenated? If so, I agree that your last suggestion, giving it a property) is the neatest. The property can then also be used by __len__. The way I envision it now is that a new object can be create with all/most arguments identical to self apart from this _data_storage property (and the label/name)? The arguments that don't match need to be stated in the child method, similar to how I did the equivalence checks in #133. The one thing is that Sequence wouldn't inherit the generic method, but would need to loop over all stored data structures and concatenate them separately (which is not an issue).

If not, then I'm not sure there's a neat unified way to do concatenation, and we may as well stick to separate methods for each.

EIT-ALIVE / eitprocessing

Create mixins for common methods #129