Reading-eScience-Centre / ncwms

ncWMS - A Web Map Service for displaying environmental data over the web
Other
63 stars 30 forks source link

GetCapabilities Bloat #3

Closed aj0415 closed 7 years ago

aj0415 commented 7 years ago

I have a lot of sld's, one for each parameter in a dataset; however, they are all listed under each layer in the GetCapabilities rather than only the one that is associated with the layer. This results in a very large GetCapabilities. Is there any way to possibly change this or reduce the size of the GetCapabilities?

guygriffiths commented 7 years ago

By SLDs, do you mean SLD template styles or actual SLDs which you link to? If you mean template styles, then GetCapabilities will show all supported styles for each layer - if they are detected as being supported by a particular layer, then they will appear in the capabilities document - there is no way to avoid this.

However, this has just highlighted a bug. If an SLD template points to a specific layer (by having a concrete value in <se:Name>) then it will appear as a style for any layer which supports all of the other template values. I'll look into fixing this.

You can however reduce the size of the capabilities document by reducing the number of advertised palettes, using the advertisedPalettes context parameter (see web.xml for the default setting).

aj0415 commented 7 years ago

I was speaking about the bug you found but was not sure if it was a bug or intentional. I have a concrete value in se:Name in each SLD template that points to a specific layer and only want that specific SLD template to show up for that single layer.

mcechini commented 7 years ago

As a consumer of @aj0415's WMS service, we end up seeing a HUUUUGE GetCapabilities with every possible style associated with every layer. It sounds like the bug you're describing is the one we're seeing.

guygriffiths commented 7 years ago

@aj0415 - OK, I'll look into fixing that and let you know when I have something working. @mcechini - What are you using the capabilities document for? If you're using it with a generic WMS client then you'll have to wait for the fix, but if you're just trying to extract specific information, then some of the GetMetadata methods may be better than parsing the entire capabilities document (as a general rule, not just on this occasion). See https://reading-escience-centre.gitbooks.io/ncwms-user-guide/content/04-usage.html#extensions for more details (or just ask!)

mcechini commented 7 years ago

@guygriffiths We're parsing the GC to get the time dimension values to determine whether a new date is available. We already know what layers were interested in and what style we will be using. Neither the layerDetails nor timesteps seems to give us that exact information. I suppose we could make some inferences from them, but will still have to have the full GC response.

Regardless, @aj0415's services is one of many that we interact with as a generic WMS service, so we would probably not dig into a custom service not in the OGC spec.

Note that there is some effort within OGC to suggest how to request a filtered GC (i.e. "Show my a GC for layers matching ") which is also of great interest to me as a WMS consumer and WMTS provider.

guygriffiths commented 7 years ago

OK, sounds like GetCapabilities is necessary then. Out of interest are you aware that you can add the URL parameter DATASET to a GetCapabilities request to only request layers in a given dataset? It's not part of the OGC spec and it may not be any use (e.g. if you're interested in all of the layers), but I thought it worth mentioning. I'm not sure it's documented, but I'll add it to the docs ready for the next release.

Interesting about the filtered GetCapabilities. I think we'd definitely implement something like that if it became part of the spec - I'll keep an eye out for developments.

mcechini commented 7 years ago

I wasn't aware of that, but you're right that since it's not in the OGC spec, we probably wouldn't use it now. We only load the GC once per execution, so if it takes a bit to download, that's ok. But this current bug, combined with @aj0415's layers list, takes ~10 minutes. That's multiple cups of coffee of waiting.

guygriffiths commented 7 years ago

OK, so I've fixed this issue so that if you define a concrete layer name in the SLD template, it will only apply to that layer. I've also changed the default behaviour of GetCapabilities to only advertise a single palette per layer, and reduced the size of the comment so that instead of listing all available palettes, it shows a link to the GetMetadata/layerDetails method where the palettes can be extracted. I think that's about as much as I can reduce the size of the capabilities document by. I'll do a release tomorrow with these latest changes.

guygriffiths commented 7 years ago

Closing this since I've just released a version with these changes. If you have any more similar problems, feel free to reopen it.

aj0415 commented 7 years ago

The GetCapabilities is much shorter now but not including all the layers. Is this a config that is default behavior? If so how can i change it?

guygriffiths commented 7 years ago

Can you give me more information about which layers are missing? Is our specific named layers? If so, what sort of data are they, and what does the corresponding sld template look like?

aj0415 commented 7 years ago

It looks like there is only 3 datasets and their information showing up in the GC when there should be 22 datasets. The information looks correct just looks like it's been truncated.

guygriffiths commented 7 years ago

OK, at what point has it been truncated? Is it after a layer, or halfway through? Can you either link to the server, or send me the resultant capabilities document?

aj0415 commented 7 years ago

getcapabilities.txt

aj0415 commented 7 years ago

Looks like it cuts it off after a layer

guygriffiths commented 7 years ago

Can you post the XML file? That seems to be a text document with formatting strings inserted.

aj0415 commented 7 years ago

Doesn't seem to allow me to download the file for some reason

guygriffiths commented 7 years ago

OK, can you just copy and paste the output into a plain text file? I think the one before was rich text or similar.

aj0415 commented 7 years ago

getcapabilities.txt

guygriffiths commented 7 years ago

That's just the extracted text content. I also need all of the XML tags etc. before I can try and figure out what's going on. It will start:

<WMS_Capabilities xmlns="http://www.opengis.net/wms" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:edal="http://reading-escience-centre.github.io/edal-java/wms" version="1.3.0" updateSequence="2016-11-01T14:10:29.739Z" xsi:schemaLocation="http://www.opengis.net/wms http://schemas.opengis.net/wms/1.3.0/capabilities_1_3_0.xsd">
<Service>
<Name>WMS</Name>
aj0415 commented 7 years ago

I just now saw after sitting on the GC page for some time it returns an error saying it rendered until the error and that it was expecting an ';'

guygriffiths commented 7 years ago

That's helpful to know, but I really would need to see what the output is at the point where it fails before I can attempt to fix it.

aj0415 commented 7 years ago

Any idea how I could get that? It won't let me save the page or download linked file

aj0415 commented 7 years ago

Here is the error: This page contains the following errors:

error on line 1571 at column 162: EntityRef: expecting ';' Below is a rendering of the page up to the first error.

mcechini commented 7 years ago

Try curl'ing it from the terminal and pipe the output to a file.

guygriffiths commented 7 years ago

Can you view the page source? You should be able to copy/paste it.

aj0415 commented 7 years ago

ASDC_GetCapabilties.txt

guygriffiths commented 7 years ago

Thanks. It seems to be cutting out right in the middle of a line, which is rather peculiar.

The GetCapabilities request uses a template and fills in the missing values - I was expecting it to run into difficulties trying to process a value which is unavailable/malformed, but that's not the case. It is failing right in the middle of a literal string. I suspect that this is most likely because the output stream is not fully written, and the real error is elsewhere, but it could be a different issue with your servlet container.

Do you see any messages in the Tomcat log when you call GetCapabilities? If not, can you try with this version: http://www.personal.rdg.ac.uk/~qx901922/ncWMS2.war and post the output again? It doesn't fix anything, but it should hopefully ensure that the entire buffer gets flushed so that we can see what's actually causing this problem.

aj0415 commented 7 years ago

Updated_GC.txt

aj0415 commented 7 years ago

Now i get the following error if it's any use to you:

This page contains the following errors:

error on line 609 at column 38: Extra content at the end of the document Below is a rendering of the page up to the first error.

guygriffiths commented 7 years ago

That second example is failing at a different place to the previous one. However, both of the examples fail on the same dataset (CERES_EBAF_TOA2.8). I assume that this dataset was being used in the version you had working? If you disable this dataset, does it work? If you could narrow the error down to a specific dataset it would be very helpful.

If it turns out to only fail with a specific dataset, would you be able to provide me with a copy of the data which I could load into my instance of ncWMS to analyse?

aj0415 commented 7 years ago

Yes this dataset was working in the previous versions. I tried removing them and discovered the page was never fully loaded. If you wait ~3.5 minutes it appends additional data to the GC and continues to load and at about 7 minutes it finally lists the times for a parameter in a dataset after the EBAF datasets. It never does list the times available for the EBAF dataset; however, it did in previous versions and was complete and loaded within seconds. Not sure why the GC is taking so long. If I disable the datasets with hourly granules it completely loads in seconds, but still no times for EBAF.

So it looks like 2 issues, having hourly granules for a dataset seems to be too much and EBAF times are not loading but use to in previous versions. Below is the link to the ncml for one of the EBAF datasets: https://opendap.larc.nasa.gov/opendap/.GIBS/CERES/EBAF/TOA_Edition2.8/CERES_EBAF-TOA_Edition2.8.ncml.html

aj0415 commented 7 years ago

wms_without_hourly.txt

This is what is returned when the hourly datasets are disabled. Everything looks to be there and the only style listed is the one that applies to the layer. Just no time for the EBAF datasets.

I'm guessing the time for the hourly datasets is too much because it has to list every hour time stamp for over 15 years for each layer and there are 7 layers.

mcechini commented 7 years ago

I think we ran into this before. Remind me, can you just not provide "start/end/PT1H" for the hourly product time dimension values?

guygriffiths commented 7 years ago

OK, it looks like the EBAF datasets are the ones causing an issue. Have you got a sample you could provide me with? It should include multiple timesteps, maybe a few days worth, it depends on the size of the data.

@mcechini - Yes, that's what should happen. I haven't made any changes to the class which does this though, so I'm not sure why it isn't working.

aj0415 commented 7 years ago

The link above is the aggregated NcML. Will that not work? Here it is again https://opendap.larc.nasa.gov/opendap/.GIBS/CERES/EBAF/TOA_Edition2.8/CERES_EBAF-TOA_Edition2.8.ncml.html

guygriffiths commented 7 years ago

Sorry, I missed that link. However, when I add the OPeNDAP endpoint to ncWMS, I get a bunch of errors when trying to read the dataset:

2016-11-03 14:00:56 WARN  DODSGrid:72 - DODSGrid cant find dimension = <time>
2016-11-03 14:00:56 WARN  DODSGrid:72 - DODSGrid cant find dimension = <time>
...
2016-11-03 14:00:56 WARN  DODSGrid:72 - DODSGrid cant find dimension = <time>
2016-11-03 14:00:56 ERROR DODSNetcdfFile:1221 -  DODS Unlimited_Dimension = time not found on dods://opendap.larc.nasa.gov:443/opendap/.GIBS/CERES/EBAF/TOA_Edition2.8/CERES_EBAF-TOA_Edition2.8.ncml

If I try and download it as NetCDF, I get the error: "fileout.netcdf - Failed to define variable time: NetCDF: String match to name in use"

However, when I add it through OPeNDAP, it appears in the capabilities document with no time dimension. It also appears in the main Godiva3 menu, although when clicked on it throws an error.

So it appears that something strange is going on with the time dimension for this dataset. Are you accessing through OPeNDAP? Do you have a copy of the underlying NetCDF file(s) you could provide?

aj0415 commented 7 years ago

Here is the underlying NetCDF file: https://opendap.larc.nasa.gov/opendap/CERES/EBAF/TOA_Edition2.8/CERES_EBAF-TOA_Edition2.8_200003-201605.nc.html The EBAF product is just one large NetCDF file that gets appended to every couple of months with new data. It is odd that the file loads fine but then gives errors. It also worked in previous versions.

aj0415 commented 7 years ago

Also, in the NcML file I am replacing the values for time with new values to comply with the consumers needs. I am doing this simply with the tags and placing the values inside them. Not sure if this is messing the new version up or not.

aj0415 commented 7 years ago

Also, if I download the NetCDF file it works in Panoply, but if i try to add it to panoply as a remote dataset from OPeNDAP i get a "Cannot read DataDDS" error.

aj0415 commented 7 years ago

Sorry for all the posts. Looks like the issue is from changing the time values in the NcML. If I point directly to the OPeNDAP link of the original NetCDF file everything works fine. In the NcML I remove the original time variable and then make another time variable with the new values and that seems to be what is not agreeing with the new version. However, it works with the older versions...

guygriffiths commented 7 years ago

OK, that's really helpful. It could be something to do with the latest NetCDF/OPeNDAP libraries (although we changed to the newest version in 2.2.2, which I think was working for you?), but I'm not 100% certain.

Can you post the contents of your NcML file here? I'll see if I can find a way around the issue, either by playing with the NcML, or by talking to the guys at Unidata who write the libraries we use.

aj0415 commented 7 years ago

Below is the contents of the NcML file: ebaf_toa_ncml.txt

guygriffiths commented 7 years ago

I think that OPeNDAP is fairly permissive with NcML files - if I use that locally, there are a couple of issues with it and it is not plottable at all in ncWMS. You should not need to remove the time variable and add it again - just redefining the values will work. You should also include the NcML namespace. Changing it to the below works on my local system, give it a try in your OPeNDAP server and see how it goes:

<netcdf location="/CERES/EBAF/TOA_Edition2.8/CERES_EBAF-TOA_Edition2.8_200003-201605.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <variable name="time">
        <values>0 31 61 92 122 153 184 214 245 275 306 337 365 396 426 457 487 518 549 579 610 640 671 702 730 761 791 822 852 883 914 944 975 1005 1036 1067 1095 1126 1156 1187 1217 1248 1279 1309 1340 1370 1401 1432 1461 1492 1522 1553 1583 1614 1645 1675 1706 1736 1767 1798 1826 1857 1887 1918 1948 1979 2010 2040 2071 2101 2132 2163 2191 2222 2252 2283 2313 2344 2375 2405 2436 2466 2497 2528 2556 2587 2617 2648 2678 2709 2740 2770 2801 2831 2862 2893 2922 2953 2983 3014 3044 3075 3106 3136 3167 3197 3228 3259 3287 3318 3348 3379 3409 3440 3471 3501 3532 3562 3593 3624 3652 3683 3713 3744 3774 3805 3836 3866 3897 3927 3958 3989 4017 4048 4078 4109 4139 4170 4201 4231 4262 4292 4323 4354 4383 4414 4444 4475 4505 4536 4567 4597 4628 4658 4689 4720 4748 4779 4809 4840 4870 4901 4932 4962 4993 5023 5054 5085 5113 5144 5174 5205 5235 5266 5297 5327 5358 5388 5419 5450 5478 5509 5539 5570 5600 5631 5662 5692 5723 5753 5784 5815 5844 5875 5905</values>
    </variable>
</netcdf>
aj0415 commented 7 years ago

Sadly I do have to remove the variable to change the values. This is the message I get when using your example when trying to change the values in place:

"3 NCMLModule ParseError: at *.ncml line=3: This version of the NCML Module cannot change the values of an existing variable! However, we got element for variable= at scope=time"

It was my understanding that our Hyrax OPeNDAP instance is very limited in what we can do with NcML. I'm not sure of an alternate way to do this with our current setup...

guygriffiths commented 7 years ago

So I guess this is an issue between Hyrax OPeNDAP and the NetCDF libraries we use to read the data over OPeNDAP. I don't think that there's anything I can do on the ncWMS side of things. Do you need to keep the data with its original time values? If not, it may be better to change them in place (e.g. using ncap2 - see https://sourceforge.net/p/nco/discussion/9829/thread/dd953366/ for an example) rather than wrapping with NcML.

aj0415 commented 7 years ago

I will look into using nco or just using python to do it. I can not modify the original so I will have to make a copy and whenever a new version is released with additional data this command would have to be triggered and ran immediately.

As far as the hourly data not allowing the GC to load in a timely manner and listing out every hourly observation for every layer, is there a fix for that?

guygriffiths commented 7 years ago

Can you post a link to the hourly data? It should use the abbreviated format, but I guess it is not doing so. Can you confirm that visualisation etc. works, and it is just the capabilities document which is failing to load?

aj0415 commented 7 years ago

Yes, it's just the GC that is failing. The images look good. Here is the link to the hourly dataset: https://opendap.larc.nasa.gov/opendap/hyrax/.GIBS/GIBS_CERES_Data/SSF/Terra-FM1-MODIS_Edition4A/contents.html

Here is the NcML:

CER_SSF_Terra-FM1-MODIS_Edition4A.ncml.txt

guygriffiths commented 7 years ago

What is the OPenDAP endpoint you are pointing ncWMS at? Presumably it's something like https://opendap.larc.nasa.gov/opendap/hyrax/.GIBS/GIBS_CERES_Data/SSF/Terra-FM1-MODIS_Edition4A.ncml (but not exactly that, since it doesn't work)?