ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
76 stars 54 forks source link

Complex ncml test case EDDGridFromNcFilesTests.testNcml failing #148

Open srstsavage opened 2 months ago

srstsavage commented 2 months ago

Describe the bug Getting ahead of things a little bit since #142 is not yet merged, but wanted a dedicated issue to track EDDGridFromNcFilesTests.testNcml issues.

Currently, with netcdf-java 5.5.3 dependencies, the following error results when loading a complex union ncML file in EDDGridFromNcFilesTests.testNcml

java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because "location" is null
 at thredds.inventory.zarr.MFileZip$Provider.canProvide(MFileZip.java:200)                                                                                                   
 at thredds.inventory.MFiles.create(MFiles.java:37)                                                                                                                          
 at ucar.nc2.internal.ncml.AggDataset.<init>(AggDataset.java:74)                                                                                                             
 at ucar.nc2.internal.ncml.Aggregation.makeDataset(Aggregation.java:453)                                                                                                     
 at ucar.nc2.internal.ncml.Aggregation.addExplicitDataset(Aggregation.java:136)                                                                                              
 at ucar.nc2.internal.ncml.NcmlReader.readAgg(NcmlReader.java:1476)                                                                                                          
 at ucar.nc2.internal.ncml.NcmlReader.readNetcdf(NcmlReader.java:521)                                                                                                        
 at ucar.nc2.internal.ncml.NcmlReader.readNcml(NcmlReader.java:478)                                                                                                          
 at ucar.nc2.internal.ncml.NcmlReader.readNcml(NcmlReader.java:397)                                                                                                          
 at ucar.nc2.internal.ncml.NcmlNetcdfFileProvider.open(NcmlNetcdfFileProvider.java:24)                          
 at ucar.nc2.dataset.NetcdfDatasets.openProtocolOrFile(NetcdfDatasets.java:431)                                                                                              
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:152)                                                                                                     
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:135)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:118)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:104)
 at gov.noaa.pfel.erddap.dataset.EDDGridFromNcFilesTests.testNcml(EDDGridFromNcFilesTests.java:155)

This was originally reported to the netcdf-java mailing list by Bob Simons in July 2022.

The error stems from loading of MFileProvider implementations using Java service loading. The canProvide(String location) method is called of each implementation, and in 5.5.3 one particular provider MFileZip location isn't checked for null (https://github.com/Unidata/netcdf-java/blob/v5.5.3/cdm/zarr/src/main/java/thredds/inventory/zarr/MFileZip.java#L200).

This bug was fixed in October 2022 with this commit.

https://github.com/Unidata/netcdf-java/commit/19f9476ed8e605e04ab6013a90ba59dbbb2d17d3#diff-05b863736a1a2b21b57d0a498f731991e82c8b824dc2235d54f3f9d5f257eb80R200

However, this change hasn't yet been included in any release. I asked about the possibility of a 5.5.4 release here: https://github.com/Unidata/netcdf-java/discussions/1332

However, even with this fix (testing with netcdf-java 5.5.4-SNAPSHOT), this test produces another error:

java.lang.IllegalStateException: Shared Dimension fakeDim0 = 4320; does not exist in a parent group
 at ucar.nc2.Variable.<init>(Variable.java:1847)
 at ucar.nc2.dataset.VariableDS.<init>(VariableDS.java:879)
 at ucar.nc2.dataset.VariableDS$Builder.build(VariableDS.java:1134)
 at ucar.nc2.dataset.VariableDS$Builder.build(VariableDS.java:985)
 at ucar.nc2.Group.<init>(Group.java:924)
 at ucar.nc2.Group.<init>(Group.java:44)
 at ucar.nc2.Group$Builder.build(Group.java:1410)
 at ucar.nc2.Group$Builder.build(Group.java:1402)
 at ucar.nc2.NetcdfFile.<init>(NetcdfFile.java:2576)
 at ucar.nc2.dataset.NetcdfDataset.<init>(NetcdfDataset.java:1611)
 at ucar.nc2.dataset.NetcdfDataset.<init>(NetcdfDataset.java:88)
 at ucar.nc2.dataset.NetcdfDataset$Builder.build(NetcdfDataset.java:1812)
 at ucar.nc2.dataset.NetcdfDataset$Builder.build(NetcdfDataset.java:1687)
 at ucar.nc2.internal.ncml.NcmlReader$NcmlElementReader.open(NcmlReader.java:1605)
 at ucar.nc2.internal.ncml.NcmlReader$NcmlElementReader.open(NcmlReader.java:1586)
 at ucar.nc2.dataset.NetcdfDatasets.acquireFile(NetcdfDatasets.java:383)
 at ucar.nc2.internal.ncml.AggDataset.acquireFile(AggDataset.java:114)
 at ucar.nc2.internal.ncml.AggregationUnion.buildNetcdfDataset(AggregationUnion.java:30)
 at ucar.nc2.internal.ncml.Aggregation.build(Aggregation.java:349)
 at ucar.nc2.internal.ncml.NcmlReader.readNetcdf(NcmlReader.java:528)
 at ucar.nc2.internal.ncml.NcmlReader.readNcml(NcmlReader.java:483)
 at ucar.nc2.internal.ncml.NcmlReader.readNcml(NcmlReader.java:385)
 at ucar.nc2.internal.ncml.NcmlNetcdfFileProvider.open(NcmlNetcdfFileProvider.java:24)
 at ucar.nc2.dataset.NetcdfDatasets.openProtocolOrFile(NetcdfDatasets.java:431)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:152)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:135)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:118)
 at ucar.nc2.dataset.NetcdfDatasets.openDataset(NetcdfDatasets.java:104)
 at gov.noaa.pfel.erddap.dataset.EDDGridFromNcFilesTests.testNcml(EDDGridFromNcFilesTests.java:155)

This is related to the attempted renaming of the fakeDim variables in the two of the three aggregated files:

$ grep dimension src/test/resources/largeFiles/viirs/MappedMonthly4km/m4.ncml 
      <dimension name="latitude" orgName="fakeDim0" />
      <dimension name="longitude" orgName="fakeDim1" />

I haven't done thorough checking to see if src/test/resources/largeFiles/viirs/MappedMonthly4km/m4.ncml is fully legal ncml, but the fake dimensions are indeed in the aggregated data files, and the target dimensions already exist in the LatLon.nc file:

$ ncks --json -M src/test/resources/largeFiles/viirs/MappedMonthly4km/LatLon.nc | jq .dimensions                                 
{
  "latitude": 4320,
  "longitude": 8640
}
$ ncks --json -M src/test/resources/largeFiles/viirs/MappedMonthly4km/V20120012012031.L3m_MO_NPP_CHL_chlor_a_4km | jq .dimensions
{
  "fakeDim0": 4320,
  "fakeDim1": 8640
}
$ ncks --json -M src/test/resources/largeFiles/viirs/MappedMonthly4km/V20120322012060.L3m_MO_NPP_CHL_chlor_a_4km | jq .dimensions
{
  "fakeDim0": 4320,
  "fakeDim1": 8640
}

To Reproduce Steps to reproduce the behavior: Run test case EDDGridFromNcFilesTests.testNcml (example mvn test -Dtest=EDDGridFromNcFilesTests#testNcml)

Expected behavior Test passes

Desktop (please complete the following information):