MechMicroMan / DefDAP

A python library for correlating EBSD and HRDIC data
Apache License 2.0
36 stars 18 forks source link

Improve masking #122

Open rhysgt opened 7 months ago

rhysgt commented 7 months ago

Masking is now performed on accessing data in the datastore.

Removed the preview function in set_mask since it is now extraneous as the original data is not being overwritten.

We should consider moving cropping and masking from hrdic into base

mikesmic commented 7 months ago

I've made some changes, can you check it works as expected? You can set the mask with: dic_map.data.generate('mask', mask=bool_array). How are these masks generated? Is it quite standard or do you change things about to fit the data? We need to change the mask function so the stored data is not mutated. This can maybe be done by casting as a masked array and then getting the filled array.

rhysgt commented 7 months ago

Have fixed the problems you stated I think - using a masked array instead of mutation and now generates a null mask in a better way (?)

Moat of the time, the masks I used are quite straightforward, for example (from docs):

To remove data points in dic_map where max_shear is above 0.8, use:

mask = dic_map.data.max_shear > 0.8

To remove data points in dic_map where e11 is above 1 or less than -1, use:

mask = (dic_map.data.e[0, 0] > 1) | (dic_map.data.e[0, 0] < -1)

To remove data points in dic_map where corrVal is less than 0.4, use:

mask = dic_map.corr_val < 0.4

rhysgt commented 7 months ago

Also - there is an inconsistency in function naming - calc_mask and set_crop?

mikesmic commented 7 months ago

I did call it set_mask but it would be confusing because it doesn't set anything, it just creates a mask image that the generate function uses. set_crop actually sets crop boundary values. I don't know about passing out masked arrays, will they work with everything else in the library? Although I looked at masking yesterday and I couldn't find a way to create an array with nans set for masked values without making a copy of the data. I need to look through the logic for the making again, my goal was to only run the masking function if a mask is set.

rhysgt commented 7 months ago

As far as I'm aware, everything still seems to works as expected with a masked array.

It does incur an overhead (but much smaller by a factor 1000 than the previous method).

Do we need to change the logic to not use masked arrays for data that isn't masked? Currently a masked array is always generated.

rhysgt commented 6 months ago

A masked array is only returned when a mask is provided. If unset, the normal map data is passed through as before.