euroargodev / argopy

A python library for Argo data beginners and experts
https://argopy.readthedocs.io
European Union Public License 1.2
177 stars 39 forks source link

Different size of matrix using create_float_source for raw and adjusted data #218

Open kamwal opened 2 years ago

kamwal commented 2 years ago

I have found a differences in size of the matrix generated by the ds.argo.create_float_source code for generating the Wong matrix .mat for OWC analysis. The differences appear for .mat matrix for raw data and adjusted data.

    ds.argo.create_float_source(force='raw')
    ds.argo.create_float_source(force='adjusted')

The equivalent matlab code to generate float source is giving the same size of output for raw and adjusted data https://github.com/euroargodev/dm_floats/blob/master/src/ow_source/create_float_source.m

The WMO float examples where the issue have been detected: 3901928, 3900797,3900799 The mismatch between size of matrices including raw and adjusted data lead to problems in extracting differences and comparing data during checks of the quality of adjusted data.

I am using the argopy v0.1.11 version.

gmaze commented 2 years ago

Hi @kamwal Could you please share here the files generated with the float source Matlab code ?

gmaze commented 2 years ago

@kamwal I looked at the output for WMO= 3901928 The difference is in the selection of data, or not, from the last 8 profiles: Screenshot 2022-05-16 at 08 53 05

And if I look to the netcdf file content with:

from argopy import DataFetcher as ArgoDataFetcher
WMO = 3901928
argo_loader = ArgoDataFetcher(src='gdac', cache=True, mode='expert', dataset='phy').float(WMO)
ds = argo_loader.load().data
dsp =  ds.argo.point2profile()
dsp.where(dsp['CYCLE_NUMBER']==164, drop=True)

I see the data mode to Delayed and the adjusted salinity full of NaNs with QC=4, that's why the raw=adjusted option do not select these profiles So I guess the question is more why the Matlab code select these ?

kamwal commented 2 years ago

3901928.zip

Thanks for looking at this.

I think it is done to don't create any issues with a mismatch of the size of the matrix for all parameters. Sometimes for some floats the QC =4 is applied not to all parameters (PRES, SAL, TEMP) like here, but only to one parameter like PSAL. Having the same size of matrices for raw and adjusted is easier for further comparison of these two datasets.

gmaze commented 2 years ago

After discussion with @cabanesc , this appears to be motivated by the post analysis use of the .mat source files: the D netcdf files are created for profiles in the source files ! Hence no D files for profiles not reported in the source file (even if full of NaNs).

I don't know how OWC handle this, but we could fix argopy to make sure to report as many profiles as before all the filtering, and fields would be full of NaNs.

kamwal commented 2 years ago

Yes, thanks it would be very helpful

gmaze commented 2 years ago

@kamwal note that I have no idea when I'll be able to fix this ...

github-actions[bot] commented 2 years ago

This issue was marked as staled automatically because it has not seen any activity in 90 days

gmaze commented 1 year ago

I don't know how OWC handle this, but we could fix argopy to make sure to report as many profiles as before all the filtering, and fields would be full of NaNs.

Although after some thoughts I'm not sure anymore if this is the way to go, since this approach is messing up matrix content with different file uses (OWC analysis vs D file production). Basically I'm cold feet with reproducing a flawed workflow based on the Matlab software.

github-actions[bot] commented 1 year ago

This issue was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] commented 5 months ago

This issue was closed automatically because it has not seen any activity in 365 days

github-actions[bot] commented 3 weeks ago

This issue was marked as staled automatically because it has not seen any activity in 90 days