Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
146 stars 71 forks source link

Issue with WMS loading >2GB variables #43

Closed jbzambon closed 5 years ago

jbzambon commented 5 years ago

I am attempting to load a large dataset (4200x4100) with 21 timesteps into WMS using THREDDS. When I do so, it fails and the page returns...

<ServiceExceptionReport xmlns="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.3.0" xsi:schemaLocation="http://www.opengis.net/ogc http://schemas.opengis.net/wms/1.3.0/exceptions_1_3_0.xsd">
<ServiceException>
Unexpected error of type java.lang.IllegalStateException
</ServiceException>
<StackTrace>
<![CDATA[
uk.ac.rdg.resc.edal.cdm.LookUpTable.<init>(LookUpTable.java:109)uk.ac.rdg.resc.edal.cdm.LookUpTableGrid.generate(LookUpTableGrid.java:93)uk.ac.rdg.resc.edal.cdm.CdmUtils.createHorizontalGrid(CdmUtils.java:279)uk.ac.rdg.resc.edal.cdm.CdmUtils.readCoverageMetadata(CdmUtils.java:174)uk.ac.rdg.resc.edal.cdm.CdmUtils.readCoverageMetadata(CdmUtils.java:127)thredds.server.wms.ThreddsDataset.<init>(ThreddsDataset.java:95)thredds.server.wms.ThreddsDataset.getThreddsDatasetForRequest(ThreddsDataset.java:270)thredds.server.wms.ThreddsWmsController.dispatchWmsRequest(ThreddsWmsController.java:165)uk.ac.rdg.resc.ncwms.controller.AbstractWmsController.handleRequestInternal(AbstractWmsController.java:207)org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:174)org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:50)org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)javax.servlet.http.HttpServlet.service(HttpServlet.java:634)org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)javax.servlet.http.HttpServlet.service(HttpServlet.java:741)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestQueryFilter.doFilter(RequestQueryFilter.java:118)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestCORSFilter.doFilterInternal(RequestCORSFilter.java:49)org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.servlet.filter.RequestPathFilter.doFilter(RequestPathFilter.java:94)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)thredds.server.RequestBracketingLogMessageFilter.doFilter(RequestBracketingLogMessageFilter.java:81)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Log4jServletFilter.java:71)org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137)org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:660)org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:798)org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)java.lang.Thread.run(Thread.java:748)
]]>
</StackTrace>
</ServiceExceptionReport>

and threddsServlet.log returns...

2019-07-30T01:56:09.069 +0000 [ 30415][ 5] ERROR - thredds.server.wms.ThreddsWmsController - dispatchWmsRequest(): Exception: java.lang.IllegalStateException: nLon (=0) and nLat (=2147483647) must be positive and > 0

I notice that 2147483647 is 2^31 (signed 32-bit number). Does WMS handle arrays as 32-bit in some locations that's causing us to run into this limit? Note that OPeNDAP handles this dataset without issue and if I subset the dataset with a stride of 4 (1050x1025), WMS is also able to run without issue.

I have put the datasets on AWS for inspection. OPeNDAP endpoint... http://54.158.195.139:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.html

WMS endpoint... http://54.158.195.139:8080/thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities

Thanks!

-Joe

lesserwhirls commented 5 years ago

It looks like there is an issue somewhere deep down in the ncWMS code when it constructs a lookup table for quick mapping lat/lon values to array indexes. This particular version of ncWMS isn't maintained any more. However, in the next version of the TDS (v5.0.0), we are using the latest version from edal-java, which might not have this limitation. Would you mind trying the latest 5.0.0 beta (here) and see if you still get this error? If so, we can open an issue on the edal-java repo and see if it's something that is fixable.

jbzambon commented 5 years ago

Thanks for your quick reply! Yes, upgrading TDS (v5.0.0) was the idea that @rsignell-usgs had offered in a side chat. I was wary about using beta software but will be happy to give it a try and report back. Thanks again!

lesserwhirls commented 5 years ago

I can totally understand being wary. For WMS, there are a few things that are not working at this point. First, customization of the WMS via wmsConfig.xml, and second, obtaining images from the Godiva interface using the "Export to PNG" link. I hope to get the first knocked out in the next week or two; the second is a bug in the Google Web Toolkit code used to generate Godiva, and I'll need the Reading folks to check that one out.

rsignell-usgs commented 5 years ago

@jbzambon, note you can also test on ncwms2 by running a standalone docker container if you prefer: https://hub.docker.com/r/axiom/ncwms

jbzambon commented 5 years ago

I was able to get a bare THREDDS v.5.0.0 beta installed on a fresh AWS instance. Unfortunately, I'm getting an error loading WMS, although not the same error. The page that loads makes no reference to 2^31...

<ServiceExceptionReport xmlns="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.3.0" xsi:schemaLocation="http://www.opengis.net/ogc http://schemas.opengis.net/wms/1.3.0/exceptions_1_3_0.xsd">
<ServiceException>
Problem creating dataset nwa/v1/NWA_v1_best.ncd at fmrc:NWA_v1 Cause: nLon (=0) and nLat (=0) must be positive and > 0
</ServiceException>
</ServiceExceptionReport>

Interestingly, threddsServlet.log doesn't show anything.

Apache's log (localhost_access_log.2019-07-30.txt ) shows 152.1.31.163 - - [30/Jul/2019:15:09:08 +0000] "GET /thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities HTTP/1.1" 400 530

OPeNDAP seems to work, again. Endpoint: http://54.81.33.213:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.html

WMS Endpoint: http://54.81.33.213:8080/thredds/wms/nwa/v1/NWA_v1_best.ncd?service=WMS&version=1.3.0&request=GetCapabilities

(This is the same exact data file that was loaded on the v.4 THREDDS instance.)

Any more ideas? Thanks again!

lesserwhirls commented 5 years ago

So I think the failing WMS response is correct and very close to the one in 4.6, although not super helpful on its own. If I look at the OPeNDAP output for lat and lon (with a stride of 20 for the sake of response size):

http://54.81.33.213:8080/thredds/dodsC/nwa/v1/NWA_v1_best.ncd.ascii?lon%5B0:20:4199%5D%5B0:20:4099%5D,lat%5B0:20:4199%5D%5B0:20:4099%5D

both have zero for all of their values. In this case, WMS is wigging out because there are effectively no latitudes and longitudes, so it cannot create a lookup table between index space and lat/lon space. Because this collection is setup as an FMRC and only exposing the best dataset, it's hard to figure out where things are going wrong from this end (I'd really need to see the FMRC config xml and some of the dataset granules).

jbzambon commented 5 years ago

Here's some things that might help (?) Please let me know if there is anything else I can send. Thanks again!

fmrc.log...

[2019-07-30T15:07:06.400+0000] INFO  thredds.featurecollection.InvDatasetFeatureCollection: FeatureCollection added = FeatureCollectionConfig name ='NWA v1' collectionN
ame='NWA_v1' type='FMRC'
  spec='/thredds_data/nwa_v1/nwa_.*\.nc$'
  timePartition =directory
  updateConfig =UpdateConfig{userDefined=true, recheckAfter='null', rescan='0 5 * * * ? *', triggerOk=true, updateType=test}
  tdmConfig =UpdateConfig{userDefined=false, recheckAfter='null', rescan='null', triggerOk=true, updateType=test}
  ProtoConfig{choice=Penultimate, change='null', param='null', outerNcml='null', cacheAll=true}
  hasInnerNcml =true
  fmrcConfig =FmrcConfig: regularize=false datasetTypes=[Best]

[2019-07-30T15:07:08.562+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= mask(lon=4200, lat=4100)
[2019-07-30T15:07:08.563+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= lon(lon=4200, lat=4100)
[2019-07-30T15:07:08.563+0000] INFO  ucar.nc2.ft.fmrc.FmrcDataset: FMRCDataset build Proto cache >10M var= lat(lon=4200, lat=4100)

catalog.xml...

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="THREDDS Server Default Catalog : You must change this to fit your server!"
         xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0
           http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.1.0.6.xsd">

  <service name="all" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/" />
    <service name="http" serviceType="HTTPServer" base="/thredds/fileServer/" />
    <!--service name="wcs" serviceType="WCS" base="/thredds/wcs/" /-->
    <service name="wms" serviceType="WMS" base="/thredds/wms/" />
    <!--service name="ncss" serviceType="NetcdfSubset" base="/thredds/ncss/" /-->
  </service>

  <service name="dap" base="" serviceType="compound">
    <service name="odap" serviceType="OpenDAP" base="/thredds/dodsC/" />
    <service name="dap4" serviceType="DAP4" base="/thredds/dap4/" />
  </service>

<dataset name="Northwest Atlantic (US East Coast, Gulf of Mexico and Caribbean Sea)">
    <metadata inherited="true">
        <serviceName>all</serviceName>
        <dataType>Grid</dataType>
        <dataFormat>netCDF</dataFormat>
    </metadata>
    <featureCollection name="NWA v1" featureType="FMRC" harvest="true" path="nwa/v1">
        <collection spec="/thredds_data/nwa_v1/nwa_.*\.nc$" />
        <update startup="true" rescan="0 5 * * * ? *" trigger="allow" />
        <fmrcConfig regularize="false" datasetTypes="Best" />
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" enhance="true" />
    </featureCollection>
</dataset>

</catalog>
lesserwhirls commented 5 years ago

What do each of the individual netCDF files look like? How are they laid out on disk? If you update the <fmrcConfig> datasetTypes attribute to include Files, I should be able to see the individual granules remotely.

jbzambon commented 5 years ago

Replaced "Best" with "Files" in <fmrcConfig> datasetTypes...

$ diff catalog.xml catalog.xml_orig 
32c32
<         <fmrcConfig regularize="false" datasetTypes="Files" />
---
>         <fmrcConfig regularize="false" datasetTypes="Best" />

Currently, there is only one file in the directory that's being sampled. Here's the ncdump of that file...

ncdump -h /thredds_data/nwa_v1/nwa_20190729.nc 
netcdf nwa_20190729 {
dimensions:
    time = UNLIMITED ; // (21 currently)
    lon = 4200 ;
    lat = 4100 ;
variables:
    float lon(lon, lat) ;
        lon:long_name = "longitude" ;
        lon:units = "degree_east" ;
    float lat(lon, lat) ;
        lat:long_name = "latitude" ;
        lat:units = "degree_north" ;
    float mask(lon, lat) ;
        mask:long_name = "mask" ;
        mask:flag_values = 0., 1. ;
        mask:flag_meanings = "land water" ;
        mask:coordinates = "lon lat" ;
    float time(time) ;
        time:long_name = "time since initialization" ;
        time:units = "seconds since 2000-01-01 00:00:00" ;
        time:calendar = "gregorian" ;
        time:field = "time, scalar, series" ;
    float sst(time, lon, lat) ;
        sst:long_name = "sea_surface_temperature" ;
        sst:units = "Kelvin" ;
        sst:time = "time" ;
        sst:coordinates = "lon lat time" ;
        sst:field = "temperature, scalar, series" ;
        sst:_FillValue = 1.e+37f ;
    float chl(time, lon, lat) ;
        chl:long_name = "mass_concentration_of_chlorophyll_in_sea_water" ;
        chl:units = "kg m-3" ;
        chl:time = "time" ;
        chl:coordinates = "lon lat time" ;
        chl:field = "chl, scalar, series" ;
        chl:_FillValue = 1.e+37f ;
    float gsst(time, lon, lat) ;
        gsst:long_name = "gradient sea_surface_temperature" ;
        gsst:units = "Kelvin meter-1" ;
        gsst:time = "time" ;
        gsst:coordinates = "lon lat time" ;
        gsst:field = "temperature, scalar, series" ;
        gsst:_FillValue = 1.e+37f ;
    float gchl(time, lon, lat) ;
        gchl:long_name = "gradient mass_concentration_of_chlorophyll_in_sea_water" ;
        gchl:units = "kg m-4" ;
        gchl:time = "time" ;
        gchl:coordinates = "lon lat time" ;
        gchl:field = "chl, scalar, series" ;
        gchl:_FillValue = 1.e+37f ;
    float u(time, lon, lat) ;
        u:long_name = "eastward_sea_water_velocity" ;
        u:units = "meter second-1" ;
        u:time = "time" ;
        u:coordinates = "lon lat time" ;
        u:field = "u, scalar, series" ;
        u:_FillValue = 1.e+37f ;
    float v(time, lon, lat) ;
        v:long_name = "northward_sea_water_velocity" ;
        v:units = "meter second-1" ;
        v:time = "time" ;
        v:coordinates = "lon lat time" ;
        v:field = "v, scalar, series" ;
        v:_FillValue = 1.e+37f ;

// global attributes:
        :Conventions = "CF-1.0" ;
        :title = "NWA Domain" ;
        :_CoordinateModelRunDate = "2019-07-29T00:00:00Z" ;
}

Thanks again!

rsignell-usgs commented 5 years ago

I wonder whether @jonblower has ideas here...

lesserwhirls commented 5 years ago

Looks like a good old fashion bug, which appears to have been introduced while getting rid of a one line if statement.

cleanup fail

Specifically here:

https://github.com/Unidata/netcdf-java/blob/b935573eda490c5a4dd301a6cb14a0e0f00c32d9/cdm/src/main/java/ucar/nc2/ft/fmrc/FmrcDataset.java#L283-L287

A one line if statement to control a log message turned into a full if / else statement by accident. The net effect is that the data from the lat/lon variables, because they contain more than 10 million points, are not being cached (thus all values were zero in the virtual FMRC dataset).

That's an easy fix, and I can backport it to v4.6 as well.

lesserwhirls commented 5 years ago

Transferring issue to https://github.com/Unidata/netcdf-java

jbzambon commented 5 years ago

Seems to be working great, thanks!