ebranlard / pyDatView

A crossplatform GUI to plot tabulated data from files (e.g. CSV, Excel, OpenFAST, HAWC2, Flex...), or python pandas dataframes
MIT License
107 stars 41 forks source link

Calculated channels uses masked data #184

Open mayankchetan opened 1 month ago

mayankchetan commented 1 month ago

When creating new calculated channels after masking the data (eg: {B1TipTDxr} - np.mean({B1TipTDxr}) ), the value of np.mean({B1TipTDxr}) is calculated from the whole time history.

Eg: This can affect removing means from simulations which have long transients.

ebranlard commented 1 month ago

Thanks, I can see how that can be a problem. The formula is indeed evaluated on the whole dataframe (df) before the mask is applied: https://github.com/ebranlard/pyDatView/blob/4140c241cb3632cc4482e814d233794a9b019d60/pydatview/formulae.py#L16

The mask is applied when the "columns" are retrieved: https://github.com/ebranlard/pyDatView/blob/4140c241cb3632cc4482e814d233794a9b019d60/pydatview/Tables.py#L874

There is likely a better way to apply a mask to a pandas dataframe without duplicating the memory, but it might require some changes.

Currently, I don't see an easy path forward, but some workaround: