Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
143 stars 68 forks source link

ZarrIosp claims any zipped dataset #1319

Closed tdrwenski closed 2 months ago

tdrwenski commented 3 months ago

From issue: https://github.com/Unidata/netcdf-java/issues/1307:

When using netcdfAll with the optional cdm-zarr code included, any zipped dataset you try to open gets claimed by ZarrIosp as a validFile. The zip is apparently opened, but if it's not actually zarr, then it is reported as empty.

tdrwenski commented 3 months ago

@rschmunk feel free to add anything I missed here. I believe it's clear and should be straight forward to fix.

rschmunk commented 3 months ago

@tdrwenski, I've been trying to look into this but keep getting sidetracked. I think it has something to do with the file or the archive somehow getting tagged as a directory but I haven't figured that bit out yet.

tdrwenski commented 3 months ago

@rschmunk, believe it's fixed now, but let us know if you still have issues!

rschmunk commented 3 months ago

@tdrwenski, At first look, that seems to have done the trick.

rschmunk commented 3 months ago

@tdrwenski, An FYI/warning/whatever in case this issue gets reported again:

If for some reason the zip process decides to include filesystem metadata along with the compressed dataset, then there will be > 1 entry in the zip and netcdfAll will decide that zip archive must be a compressed zarr archive.

I discovered this because I just tried to open a zipped netCDF file and was startled that my app, which is using a freshly built netcdfAll, reported it was a zipped zarr archive. I deleted the zip archive, re-zipped the NC file again at the command line, and tried again; this time my app successfully uncompressed and opened it as a netCDF file.

Further testing revealed that if you are using a Mac and use the desktop control-click on a dataset icon, and select Compress in the contextual menu, the zip file that results will have 2 entries in it. In the case I just tested, the data file was named eccc2016.nc and the desktop compression command was including a __MACOSX/._eccc2016.nc metadata entry in the archive.

tdrwenski commented 3 months ago

Thanks for that extra info @rschmunk, I was not aware of this. We may need a more robust fix for this issue then. I will reopen this issue so we don't forget, but not sure if we will get to it right away.

rschmunk commented 3 months ago

@trdweski, What I encountered was enough of an edge case that I don't think there's a rush. I would expect most people zipping their datasets are going to do so from the command line or by script.

tdrwenski commented 2 months ago

On second thought, I think this is enough of an edge case that we don't need to handle it now. Users can delete those resource forks (__MACOSX/) files from their zips to work around it. We can always revisit it if more people are zipping their files this way.