IMAP-Science-Operations-Center / imap_processing

IMAP
MIT License
6 stars 13 forks source link

Update cdflib version #606

Open subagonsouth opened 3 months ago

subagonsouth commented 3 months ago

Algorithm Description:

Update to the latest version of cdflib. The major relevant changes to be aware of are:

Other Notes:

- The code that sets the to_datetime keyword argument in imap_processing.cdf.utils.load_cdf should be removed. to_datetime was already removed as part of another PR.

Related Issues/PRs:

#595 - this was the ticket that added defaulting to to_datetime = True

subagonsouth commented 3 weeks ago

I started on this but didn't make appreciable progress. The one thing that I did identify is that there is a bug in cdflib in that it tries to put arrays with dtype=uint64 into CDF_UINT8 which does not exist. All variables that are specified as uint64 should be changed to int64.

greglucas commented 3 weeks ago

Upgrading to v1.3.1 produces 5 failures within the test suite. All of which, I'm not able to address, so I'm going to punt to others on this as well 😂

FAILED imap_processing/tests/hi/test_l1a.py::test_sci_de_decom - KeyError: 'CDF_UINT8'
FAILED imap_processing/tests/hi/test_l1a.py::test_app_hist_decom - cdflib.xarray.xarray_to_cdf.ISTPError: ISTP Compliance Warning: LABL_PTR_1 attribute is required for variable...
FAILED imap_processing/tests/mag/test_mag_decom.py::test_mag_raw_cdf_generation - cdflib.xarray.xarray_to_cdf.ISTPError: ISTP Compliance Warning: LABL_PTR_1 attribute is required for variable...
FAILED imap_processing/tests/mag/test_mag_l1b.py::test_cdf_output - cdflib.xarray.xarray_to_cdf.ISTPError: Variable epoch was determined to be an ISTP 'Epoch' variable, but it i...
FAILED imap_processing/tests/swe/test_swe_l1a.py::test_cdf_creation - Exception: Cannot include both LABLAXIS and LABL_PTR_2 in the attributes to variable science_data.

@bryan-harter : There is an issue with CDF_UINT8, if we use a np.uint64 as a numpy array variable. The CDF format doesn't have a UINT8. https://cdaweb.gsfc.nasa.gov/pub/software/cdf/doc/cdf34/cdf340ifd.pdf image @subagonsouth are you willing to use a different data type (either uint32 or int64)? Or do we need to make two variables to store this field on Hi?

Most of the other failures are related to ISTP attributes, specifically this one:

Exception: Cannot include both LABLAXIS and LABL_PTR_2 in the attributes to variable science_data.

However, even when I remove all LABLAXIS values from the yaml definitions, the error still occurs. Somewhere along the line LABLAXIS is getting injected into the CDF attributes. @maxinelasp I think this might have to do with the AttributeManager, but I am not positive. There are still a bunch of warnings for the attribute manager schema that I think were going to be solved with SAMMI? So do we need to wait for that and what do we estimate the timescale to be?

tech3371 commented 3 weeks ago

I can comment on what I found on LABLAXIS error. We tried to remove LABLAXIS for data with 2 or more dimensions since it will need to use LABL_PTR_i. But even after removing LABLAXIS from yaml file, it was being added back again because it's defined as required in this line. I believe we need to change logic in this and that line.

This blocks show example of what CDF attrs definition looks like for data with 2 or more dimensions.

science_data:
  CATDESC: Decompressed Counts
  DEPEND_0: epoch
  DEPEND_1: spin_angle
  DEPEND_2: polar_angle
  LABL_PTR_1: spin_angle_label
  LABL_PTR_2: polar_angle_label
  DISPLAY_TYPE: spectrogram
  FIELDNAM: Decompressed Counts
  FORMAT: I5
  UNITS: counts
  VALIDMAX: 66539
  VALIDMIN: 0
  VAR_TYPE: data
  FILLVAL: -9223372036854775808
  VAR_TYPE: data
  SCALETYP: linear
tech3371 commented 3 weeks ago

I believe we get those errors because updated cdflib already follows this scheme from SPDF website

LABLAXIS --- required if not using LABL_PTR_1 should be a short character string (approximately 10 characters, but preferably 6 characters - more only if absolutely required for clarity) which can be used to label a y-axis for a plot or to provide a heading for a data listing. LABL_PTR_1, LABL_PTR_2, etc. --- required if not using LABLAXIS is used to label a dimensional variable when one value of LABLAXIS is not sufficient to describe the variable or to label all the axes. LABL_PTR_i is used instead of LABLAXIS, where i can take on any value from 1 to n where n is the total number of dimensions of the original variable. The value of LABL_PTR_1 is a variable which will contain the short character strings which describe the first dimension of the original variable. The actual labels should be short as described above for LABLAXIS. The value of the attribute must be a variable in the same CDF data set. See example.

maxinelasp commented 2 weeks ago

@tech3371 I believe if you update the schema here to set LABLAXIS required=false it will fix the error. This is also something that is reworked in SAMMI - I'm not sure if the same issue exists there.

maxinelasp commented 2 weeks ago

https://github.com/swxsoc/sammi/issues/7

maxinelasp commented 2 weeks ago

Sorry @greglucas , I missed your question to me at the end! This is eventually going to be fixed in SAMMI, probably on the scale of weeks. But, like I mentioned in my prior comment, it's an easy bandaid fix for now. We just need to make sure manually to include either LABLAXIS or LABL_PTR_n until it's actually fixed in SAMMI, but cdflib checks that so we have a safety net still.

subagonsouth commented 2 weeks ago

@subagonsouth are you willing to use a different data type (either uint32 or int64)? Or do we need to make two variables to store this field on Hi?

Yep... see my initial comment. They should all be changed to int64 by modifying the default_uint64_attrs -> default_int64_attrs in imap_hi_variable_attrs.yaml.

subagonsouth commented 2 weeks ago

However, even when I remove all LABLAXIS values from the yaml definitions, the error still occurs. Somewhere along the line LABLAXIS is getting injected into the CDF attributes.

What confuses me is that Hi has two dimensional variables that don't have this issue. I didn't have time to dig down to understand the differences. Perhaps it is because I am calling with check_schema=False?

attr_mgr.get_variable_attributes(f"hi_pset_{var_name}", check_schema=False)