Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
146 stars 70 forks source link

[5.5.3]: Issues with the reading of Zarr datasets #1100

Open drentawc opened 2 years ago

drentawc commented 2 years ago

Versions impacted by the bug

v5.x

What went wrong?

I am currently trying to do some Zarr data access testing for a piece of Java geographic mapping software. I noted that the NetCDF-Java library specifically only supports the base Zarr v2 spec and was thus trying to convert NetCDF4 files to the Zarr dataset using the hdf5 and Zarr python package. That caused errors, I asked the Zarr developers what they recommended and specifically said to use xarray's to_zarr() method for NetCDF files. I have used this in the past and have been able to convert many NetCDF accordingly for use in Python specific tests but NetCDF-Java continuously throws errors when trying to open these datasets when using NetcdfFiles.open(). Is there another way to generate Zarr files from HDF/NetCDF that will work with NetCDF-Java? Or will I need to wait for future versions that support Zarr v3, NCZarr, or Xarray?

Relevant stack trace

Java access code:

        try (NetcdfFile ncfile = NetcdfFiles.open(zarrPath);) {

            System.out.println(ncfile.getVariables());

        } catch (IOException ioe) {

            System.out.println(ioe);

        }

Relevant log messages

Exception in thread "main" java.io.IOException: java.lang.IllegalArgumentException: Cannot determine attribute's type at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:279) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:243) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:216) at nczarrtest.Zarr.netCdfRead(Zarr.java:85) at nczarrtest.Zarr.main(Zarr.java:52) Caused by: java.lang.IllegalArgumentException: Cannot determine attribute's type at ucar.nc2.Attribute$Builder.setValues(Attribute.java:821) at ucar.nc2.iosp.zarr.ZarrHeader.lambda$makeAttributes$0(ZarrHeader.java:241) at java.util.HashMap$KeySet.forEach(HashMap.java:934) at ucar.nc2.iosp.zarr.ZarrHeader.makeAttributes(ZarrHeader.java:237) at ucar.nc2.iosp.zarr.ZarrHeader.read(ZarrHeader.java:128) at ucar.nc2.iosp.zarr.ZarrIosp.build(ZarrIosp.java:59) at ucar.nc2.NetcdfFiles.build(NetcdfFiles.java:811) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:750) at ucar.nc2.NetcdfFiles.open(NetcdfFiles.java:276) ... 4 more

If you have an example file that you can share, please attach it to this issue.

If so, may we include it in our test datasets to help ensure the bug does not return once fixed? Note: the test datasets are publicly accessible without restriction.

N/A

Code of Conduct

haileyajohnson commented 2 years ago

What kind of Zarr dataset are you trying to open? We do currently only support pure Zarr v2.

drentawc commented 2 years ago

I created a Zarr dataset using xarray by doing the following :

data = xa.open_dataset(netcdfFilePath)

data.to_zarr('outputs/netcdf.zarr')

I am just trying to convert a NetCDF file into Zarr and then to eventually access it from an S3 bucket using NetCDF-Java. I would think this method of creating a datset would adhere to the pure Zarr v2 spec since the Zarr developers recommend using xarray to convert NetCDF files to Zarr. If not then I am not sure how to correctly create Zarr to be accessed with NetCDF-Java.

haileyajohnson commented 2 years ago

Could you provide a sample file for us to debug?

drentawc commented 2 years ago

Yes here are a couple netcdf files I used as well as their zarr counterpart that were created using xarray. chlor_a_zarr.tar.gz smdata_zarr.tar.gz

rschmunk commented 2 years ago

Taking a look at the smdata store, the exception occurs because the coord_ref variable has an _ARRAY_DIMENSIONS attribute which is an empty array.

Simply removing that attribute doesn't solve anything, as I then get encounter an invalid regex exception when trying to open the data store.

drentawc commented 2 years ago

Ahh I should have realized that there may be an issue with the _ARRAY_DIMENSIONS attribute but since removing that doesn't resolve the full issue, is there a surefire way to convert NetCDF or HDF files to a Zarr store since the zarr team recommends xarray?

drentawc commented 2 years ago

Or could it be an issue with the NetCDF files formatting/attributes/data that are not being properly converted to Zarr when using the Xarray method?