MAVENSDC / cdflib

A python module for reading NASA's Common Data Format (cdf) files
MIT License
82 stars 45 forks source link

String-valued zVariables: Writing & reading yields wrong numpy array shape. #172

Open ErikPGJ opened 1 year ago

ErikPGJ commented 1 year ago

Footnote: I sent an e-mail on a similar issue 2022-10-27 but I can not reproduce that exact problem now (possibly confused two different installs, with different cdflib versions).

I am trying to store a string-valued zVariable in a CDF, but when I read it back from the CDF, it has a different shape. (cdflib 0.4.8, xarray 2022.10.0, numpy 1.23.0, Python 3.9.2).

with cdflib.cdfwrite.CDF(PATH, delete=True, cdf_spec={'Compressed': 0}) as cdf:
    NA_CHAR = np.array(['abc', 'de', 'g'])
    cdflib_data = NA_CHAR.tolist()     # Converting to LIST.
    cdf.write_var(
        {
            'Variable': 'ZV_CHAR',
            'Data_Type': cdf.CDF_CHAR,
            'Num_Elements': 3,
            'Rec_Vary': [True],
            'Dim_Sizes': (1,),
            'Var_Type': 'zVariable',
            'Compress': 0,
        },
        var_data=cdflib_data,
    )

cdf = cdflib.cdfread.CDF(PATH)
act_na = cdf.varget('ZV_CHAR')
act_xna = cdflib.cdf_to_xarray(PATH).get('ZV_CHAR').data

print(f'act_na.shape  = """"{act_na.shape}""""')
print(f'act_xna.shape = """"{act_xna.shape}""""')
print(f'act_na  = """"{act_na}""""')
print(f'act_xna = """"{act_xna}""""')

which generates the following output

act_na.shape  = """"(3, 1, 1)""""
act_xna.shape = """"(3,)""""
act_na  = """"[[['abc']]

 [['de']]

 [['g']]]""""
act_xna = """"['abc' 'de' 'g']""""

I see the same behaviour for CDF_CHAR and CDF_UCHAR.

ErikPGJ commented 1 year ago

It could be that as little as setting 'Dim_Sizes': (), solves the problem, generating

act_na.shape  = """"(3,)""""
act_xna.shape = """"(3,)""""
act_na  = """"['abc' 'de' 'g']""""
act_xna = """"['abc' 'de' 'g']""""
ErikPGJ commented 1 year ago

Also, I also can not get writing+reading 2D string-valued zVariables to work as I would expect.

with cdflib.cdfwrite.CDF(PATH, delete=True, cdf_spec={'Compressed': 0}) as cdf:
    NA_CHAR = np.array([['11', '12'], ['21', '22'], ['31', '32']])
    cdflib_data = NA_CHAR.tolist()     # Converting to LIST.
    print(f'NA_CHAR.shape = {NA_CHAR.shape}')
    print(f'cdflib_data = """"{cdflib_data}""""')
    cdf.write_var(
        {
            'Variable': 'ZV_CHAR',
            'Data_Type': cdf.CDF_CHAR,
            'Num_Elements': 3,
            'Dim_Sizes': (3,),
            'Rec_Vary': (True, True),  # Req., but docs says only for rVars?!
            'Var_Type': 'zVariable',
            'Compress': 0,
        },
        var_data=cdflib_data,
    )

cdf = cdflib.cdfread.CDF(PATH)
act_na = cdf.varget('ZV_CHAR')
act_xna = cdflib.cdf_to_xarray(PATH).get('ZV_CHAR').data

print(f'act_na.shape  = """"{act_na.shape}""""')
print(f'act_xna.shape = """"{act_xna.shape}""""')
print(f'act_na  = """"{act_na}""""')
print(f'act_xna = """"{act_xna}""""')

generates

NA_CHAR.shape = (3, 2)
cdflib_data = """"[['11', '12'], ['21', '22'], ['31', '32']]""""
act_na.shape  = """"(2, 3, 1)""""
act_xna.shape = """"(2, 3)""""
act_na  = """"[[['11']
  ['12']
  ['21']]

 [['22']
  ['31']
  ['32']]]""""
act_xna = """"[['11' '12' '21']
 ['22' '31' '32']]""""

cdfdump:

Variable Data:
  Record # 1: ["11","12","21"]
  Record # 2: ["22","31","32"]

Note that while I do write a 3x2 array and get a 2x3 array back, it is not a transposed array/matrix: The components are in the wrong locations.

Note: I get an error if I omit Rec_vary but https://pypi.org/project/cdflib/0.4.4/ (the last version with documentation at that location) tells me that Rec_Vary is only for rVariables while I am writing a zVariable.

ErikPGJ commented 1 year ago

Any progress on this?

bryan-harter commented 1 year ago

Sorry for the delay getting back to you, and thanks for documenting the error so well! I think we'll take a look at this shortly.