ACHMartin / seastar_project

4 stars 0 forks source link

Combined look direction dataset problem with 'GBPGridInfo' dimension #165

Closed DavidMcCann-NOC closed 1 year ago

DavidMcCann-NOC commented 1 year ago

The following code ds_level1 = xr.concat([dsf, dsa, dsm], 'Antenna', join='outer', coords='all') ds_level1 = ds_level1.assign_coords(Antenna=('Antenna', ['Fore', 'Aft', 'Mid']))

Will produce the error: ValueError: cannot reindex or align along dimension 'GBPGridInfo' because the index has duplicate values

This concatenation works with just the Fore and Aft datasets, but the dimension for the Mid antenna is somehow different along 'GBPGridInfo'. To find out if this is important and how to get around it, as currently the mid antenna isn't being included in the oscar.init_level1_dataset function

DavidMcCann-NOC commented 1 year ago

Currently looking in to the possibility of ignoring the coordinate and dimension 'GBPGridInfo' when loading in the datasets as we currently don't use it at all for the calculations and none of the DataArrays actually have it as a coordinate, so something like:

dsm = xr.open_dataset(mid_file, mask_and_scale=True,drop_variables='GBPGridInfo')

This works, however there is then the issue of certain variables not present in the mid antenna datasets that are present in the fore and aft. I'm looking in to scanning for these conflicts and inserting dummy NaN arrays for the offending variables

DavidMcCann-NOC commented 1 year ago

This produces a dataset of variables not in the mid antenna dataset but present in the fore:

dsf[[x for x in dsf.data_vars if x not in dsm.data_vars]]

DavidMcCann-NOC commented 1 year ago

This will work, creating a dataset of the missing variables but filled with NaN

ds_diff=dsf[[x for x in dsf.data_vars if x not in dsm.data_vars]] ds_diff.where(ds_diff == np.nan, other=np.nan)

ACHMartin commented 1 year ago

Currently looking in to the possibility of ignoring the coordinate and dimension 'GBPGridInfo' when loading in the datasets as we currently don't use it at all for the calculations and none of the DataArrays actually have it as a coordinate, so something like:

dsm = xr.open_dataset(mid_file, mask_and_scale=True,drop_variables='GBPGridInfo')

This works, however there is then the issue of certain variables not present in the mid antenna datasets that are present in the fore and aft. I'm looking in to scanning for these conflicts and inserting dummy NaN arrays for the offending variables

Perhaps we can keep only the coordinates we are interested in (Ground/Cross Range), cf decode_coords (bool or {"coordinates", "all"}, optional) –

ACHMartin commented 1 year ago

This will work, creating a dataset of the missing variables but filled with NaN

ds_diff=dsf[[x for x in dsf.data_vars if x not in dsm.data_vars]] ds_diff.where(ds_diff == np.nan, other=np.nan)

what happens if in XR.concat we use data_vars='all'? I am sure XR should have something automatic. We just need to find the correct parameter!

DavidMcCann-NOC commented 1 year ago

With data_vars='all' we would still get the indexing error like: ValueError: 'SigmaImageSingleLookRealPartSlave' is not present in all datasets.

(that variable being the first found that isn't present in the mid antenna set). I agree though, there likely is something built in - just need to ask the right question!

Currently looking in to the possibility of ignoring the coordinate and dimension 'GBPGridInfo' when loading in the datasets as we currently don't use it at all for the calculations and none of the DataArrays actually have it as a coordinate, so something like: dsm = xr.open_dataset(mid_file, mask_and_scale=True,drop_variables='GBPGridInfo') This works, however there is then the issue of certain variables not present in the mid antenna datasets that are present in the fore and aft. I'm looking in to scanning for these conflicts and inserting dummy NaN arrays for the offending variables

Perhaps we can keep only the coordinates we are interested in (Ground/Cross Range), cf decode_coords (bool or {"coordinates", "all"}, optional) –

This is probably a good idea for neatness anyway, and something that should be put in to #167

DavidMcCann-NOC commented 1 year ago

When looking at #175 I have realised that we need GBPGridInfo in order to calculate the SARX-Y, Incidence Angle and Squint fields via the scripts metasensing sent us. So not a simple case of dropping it as a dimension just yet

DavidMcCann-NOC commented 1 year ago

Closed with pull request #192, this dimension no longer causes any issues as it is confined to the L1 datasets