geomagpy / magpy

MagPy (or GeomagPy) is a Python package for analysing and displaying geomagnetic data.
BSD 3-Clause "New" or "Revised" License
49 stars 27 forks source link

GUI can't take into account baselines with jumps (multiple fitting parts) #120

Closed stephanbracke closed 2 years ago

stephanbracke commented 2 years ago

If baseline fit is spread out in different parts (more then one fitting) the graphical userinterface doesn't take correctly notice of these parts. One of the big problems here is that internally the way the fit is assigned not always done the same way (different when loaded from a file or directly in memory. Lets take the scenario when we load a file (only magpy format ). Here the different parts are stored in the header of the abstream under a key 'DataFunctionObject' the structure of storage is

[[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437a220>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437ab30>,
 'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437acc0>}, 
18940.346550925926, 18999.379305555554], 
[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437aef0>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57438e0e0>, 
'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57438e2c0>}, 
19003.53148148148, 19121.302916666667]]

This DataFunctionObject stores the dx,dy,dz fits with start and end time expressed in days. However this is the only place where the different parts are stored but this is never used when invoking baseline fitting. In reality this is what coded : https://github.com/geomagpy/magpy/blob/38eb6e9cb0d0d07ce25c01d5030458f98ef895d9/magpy/gui/magpy_gui.py#L4928-L4936 Basically he gets it from a previous set options. When just loading a baseline cdf file it will fall into the default ones(always spline even if you did a mean or polynomial fit). Start and stop are aligned with min and max of the abstream and he will now redo a full spline fit without taking into account the previous made fit. While trying to take the different parts into account I changed code into

           for func in absstream.header['DataFunctionObject']:
                start =  num2date(func[1]).replace(tzinfo=None)
                end = num2date(func[2]).replace(tzinfo=None)
                self.plotstream.baseline(absstream, fitfunc='spline', knotstep=float(knotstep), fitdegree=int(degree),
                                         startabs=start, endabs=end,extradays=0)

This will take into account the different parts (and counter baseline jumps) but I missed information for each fit I don't have the fitfunction string the fitdegree or knotstep because the functions are saved as scipy objects and this high level info isn't available anymore. I need them because the method in stream.py demands this info. https://github.com/geomagpy/magpy/blob/38eb6e9cb0d0d07ce25c01d5030458f98ef895d9/magpy/stream.py#L2227-L2252

It would probably be better to extend or change the objectlist saved into the DataFunctionObject header

[[{'fdx': {funct:'spline','fitdegree':5,'knotstep=0.3},
 'fdy': {funct:'spline','fitdegree':5,'knotstep=0.3},
 'fdz': {funct:'spline','fitdegree':5,'knotstep=0.3}, 
18940.346550925926, 18999.379305555554], ...]

or with less impact but less future proof

[[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437a220>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437ab30>,
 'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437acc0>}, 
18940.346550925926, 18999.379305555554,'spline',5,0.3], ....

This header parameter DataFunctionObject should be set whenever you push on fit button so that working with stream in memory will have the same behaviour and code functionality.

Furthermore I always have the impression that the button baseline and baselinecorr should be joined together to one click.

leonro commented 2 years ago

The recent updated for version 1.1 include solutions for all requested changes: