Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
143 stars 68 forks source link

Problems opening Zipped datasets #1307

Closed rschmunk closed 4 months ago

rschmunk commented 4 months ago

Versions impacted by the bug

Working with an up-to-date 5.5.4 snapshot...

What went wrong?

This issue describes two problems opening zipped datasets, one for which I provide a possible fix and the second which looks a lot trickier.

First, if one builds the current netcdfAll 5.5.4 snapshot with no optional code includes, then opening a zipped dataset set can fail.

My code calls NetcdfFiles.open ( File.getPath ( ) ). When trying to open a zipped dataset, this has been failing with a message that the file does not look like CDM. This message is not correct, but is created and thrown when an exception from lower down is caught midway and overwritten. Closer examination reveals that the real exception is a String index failure, and that is coming from line 580 in NetcdfFiles.makeUncompressed:

 String itemName = itempath.substring(1); // remove initial /

That line throws the exception when itempath is the empty String. This will occur if the filepath provided in the original open call above simply ends in "zip", which it very likely is. Thus, line 580 seemingly should be rewritten as

 String itemName = (itempath.length() > 1) ? itempath.substring(1) : "";

If this change is made, then the open method above will successfully uncompress the dataset from the zip file, and my code is happy.

Note, though, that if the zip archive includes more than one data file, then the first zip entry encountered will be the one extracted, and any others are ignored. Unless of course itemName is somehow not the empty String.

I assume that the use of itemName is so that one can call the open method, and pass it a zip archive name followed by the name of a dataset within. But my code is simply accepting the name of a zip archive from the user and passing only that.

Additional note, this bug does not occur when using toolsUI to open a zipped dataset. I haven't figured out why things do work there, but with the multiplicity of open methods in NetcdfFiles, I assume a different one is being used by toolsUI.

The second bug I ran into, and likely this deserves its own issue number, is that if the above fix is implemented and if i build netcdfAll with the optional cdm-zarr code included, then trying to open any zipped dataset gets claimed by ZarrIosp as a validFile. The zip is apparently opened, but if it's not actually zarr, then it is reported as empty.

Relevant stack trace

No response

Relevant log messages

No response

If you have an example file that you can share, please attach it to this issue.

If so, may we include it in our test datasets to help ensure the bug does not return once fixed? Note: the test datasets are publicly accessible without restriction.

N/A

Code of Conduct

rschmunk commented 4 months ago

So #1309 solves the first problem I described with trying to acquire a zipped dataset without appending a zip entry name to the file path.

So perhaps this issue could be closed, but then a separate issue should be filed about the second problem with ZarrIosp barging in and trying to claim any zip file seen if one builds netcdfAll with the zarr code included?

haileyajohnson commented 4 months ago

Yeah let's track that in a separate issue :)

rschmunk commented 4 months ago

@haileyajohnson, Okay. Let me experiment a bit with logging and see if I can provide a bit more detail in an issue.