Reading-eScience-Centre / edal-java

Environmental Data Abstraction Layer libraries
Other
39 stars 30 forks source link

Performance issues with GetMap with a transformation #150

Closed tdrwenski closed 1 year ago

tdrwenski commented 1 year ago

Greetings! Some TDS users (using version TDS 5.4/ edal-java 1.5.0) have run into a performance issue with WMS and I am hoping you can help me understand where it might be coming from and if it's possible to fix or work around.

Godiva3 is working great on their dataset, but the GetMap requests their web app is doing are very slow (~20 seconds on their production server and ~6 seconds on their dev server) for a 20 kB map. In addition, they say the performance was fine when they used TDS 4/ ncWMS 1 with the same requests.

I believe the differences between the Godiva3 and their web app are:

I also poked around in the code a bit and I see that Domain2DMapper::forGrid is called and it has a cache, but the bounding box must be an exact match for this cache element to be used. I don't think the cache is helping at all in their web app, as any pan or zoom changes the bounding box in the GetMap request. I can imagine that requesting tiles with fixed bounding boxes is therefore one way they could improve performance. However, since performance was fine with TDS 4/ ncWMS 1 even without tiling, I am wondering if there were some changes are between ncWMS 1 and edal-java that might have caused a performance regression.

I would appreciate any insight or advice you could give regarding this situation and if it's possible to fix or work around. I am happy to try to provide a reproduction path or more information if you think it would be useful!

ghansham commented 1 year ago

I agree with you. There is a performance drop from ncwms 1 to 2.

On Wed, 18 Jan, 2023, 23:30 Tara Drwenski, @.***> wrote:

Greetings! Some TDS users (using version TDS 5.4/ edal-java 1.5.0) have run into a performance issue with WMS and I am hoping you can help me understand where it might be coming from and if it's possible to fix or work around.

Godiva3 is working great on their dataset, but the GetMap requests their web app is doing are very slow (~20 seconds on their production server and ~6 seconds on their dev server) for a 20 kB map. In addition, they say the performance was fine when they used TDS 4/ ncWMS 1 with the same requests.

I believe the differences between the Godiva3 and their web app are:

  • Godiva3 tiles its requests, resulting in smaller requests and their app does not.
  • Their web app requests a difference CRS code than the data is stored in (EPSG:3857) so a transformation has to be performed, which is not the case for Godiva3.

I also poked around in the code a bit and I see that Domain2DMapper::forGrid is called and it has a cache, but the bounding box must be an exact match for this cache element to be used. I don't think the cache is helping at all in their web app, as any pan or zoom changes the bounding box in the GetMap request. I can imagine that requesting tiles with fixed bounding boxes is therefore one way they could improve performance. However, since performance was fine with TDS 4/ ncWMS 1 even without tiling, I am wondering if there were some changes are between ncWMS 1 and edal-java that might have caused a performance regression.

I would appreciate any insight or advice you could give regarding this situation and if it's possible to fix or work around. I am happy to try to provide a reproduction path or more information if you think it would be useful!

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJAS37MS3365EFJN3L3WTAVTZANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ghansham commented 1 year ago

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

Another point is ncwms1 used java native arrays for storing variable data. In ncwms2, value3d, value4d classes are created which are wrapper over the native arrays. And they do explicit array bound checks in each dimension when storing or accessing every data value.

Regards Ghansham

On Thu, 19 Jan, 2023, 07:01 Ghansham Sangar, @.***> wrote:

I agree with you. There is a performance drop from ncwms 1 to 2.

On Wed, 18 Jan, 2023, 23:30 Tara Drwenski, @.***> wrote:

Greetings! Some TDS users (using version TDS 5.4/ edal-java 1.5.0) have run into a performance issue with WMS and I am hoping you can help me understand where it might be coming from and if it's possible to fix or work around.

Godiva3 is working great on their dataset, but the GetMap requests their web app is doing are very slow (~20 seconds on their production server and ~6 seconds on their dev server) for a 20 kB map. In addition, they say the performance was fine when they used TDS 4/ ncWMS 1 with the same requests.

I believe the differences between the Godiva3 and their web app are:

  • Godiva3 tiles its requests, resulting in smaller requests and their app does not.
  • Their web app requests a difference CRS code than the data is stored in (EPSG:3857) so a transformation has to be performed, which is not the case for Godiva3.

I also poked around in the code a bit and I see that Domain2DMapper::forGrid is called and it has a cache, but the bounding box must be an exact match for this cache element to be used. I don't think the cache is helping at all in their web app, as any pan or zoom changes the bounding box in the GetMap request. I can imagine that requesting tiles with fixed bounding boxes is therefore one way they could improve performance. However, since performance was fine with TDS 4/ ncWMS 1 even without tiling, I am wondering if there were some changes are between ncWMS 1 and edal-java that might have caused a performance regression.

I would appreciate any insight or advice you could give regarding this situation and if it's possible to fix or work around. I am happy to try to provide a reproduction path or more information if you think it would be useful!

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJAS37MS3365EFJN3L3WTAVTZANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jonblower commented 1 year ago

Another point is ncwms1 used java native arrays for storing variable data. In ncwms2, value3d, value4d classes are created which are wrapper over the native arrays. And they do explicit array bound checks in each dimension when storing or accessing every data value.

I suspect this is the cause of the performance hit, but it's difficult for me to test. Any thoughts, @guygriffiths?

ghansham commented 1 year ago

And also for setting values at a location setter methods of these wrapper classes are called. Just an observation.

On Thu, 19 Jan, 2023, 15:23 Jon Blower, @.***> wrote:

Another point is ncwms1 used java native arrays for storing variable data. In ncwms2, value3d, value4d classes are created which are wrapper over the native arrays. And they do explicit array bound checks in each dimension when storing or accessing every data value.

I suspect this is the cause of the performance hit, but it's difficult for me to test. Any thoughts, @guygriffiths https://github.com/guygriffiths?

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1396705116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJE3BBLW4JD6KUPJDADWTEFI5ANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

tdrwenski commented 1 year ago

Thank you for your responses!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

Is this configurable at the moment? I see that scanline strategy is indeed being used currently and is set from getOptimumDataReadingStrategy since the file is a netCDF file.

ghansham commented 1 year ago

If you can force it and recompile and regenerate the war and deploy and test it.

On Thu, 19 Jan, 2023, 21:27 Tara Drwenski, @.***> wrote:

Thank you for your responses!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

Is this configurable at the moment? I see that scanline strategy is indeed being used currently and is set from getOptimumDataReadingStrategy since the file is a netCDF file.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1397205559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJCVCP23QR43B6KGB3DWTFP6HANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

ghansham commented 1 year ago

Data reading strategy can be set to boundingbox if datasets are small in case they are netcdf or hdf4. The large dataset test can be introduced in this case as well.

Or if a configuration item can be introduced similar to dataset readimg class.

Regards Ghansham

Regards Ghansham

On Thu, 19 Jan, 2023, 23:58 Ghansham Sangar, @.***> wrote:

If you can force it and recompile and regenerate the war and deploy and test it.

On Thu, 19 Jan, 2023, 21:27 Tara Drwenski, @.***> wrote:

Thank you for your responses!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

Is this configurable at the moment? I see that scanline strategy is indeed being used currently and is set from getOptimumDataReadingStrategy since the file is a netCDF file.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1397205559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJCVCP23QR43B6KGB3DWTFP6HANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

ghansham commented 1 year ago

One more point that I have observed is regarding this:

https://github.com/Unidata/netcdf-java/issues/1027

This issue may also play spoil sport which there is ncj-5.x but not in 4.x.

And we dont have an ncwms2 build with tds-4.x.

I had shared this issue With @Guy Griffiths @.***> also.

regards Ghansham

On Fri, Jan 20, 2023 at 6:20 AM Ghansham Sangar @.***> wrote:

Data reading strategy can be set to boundingbox if datasets are small in case they are netcdf or hdf4. The large dataset test can be introduced in this case as well.

Or if a configuration item can be introduced similar to dataset readimg class.

Regards Ghansham

Regards Ghansham

On Thu, 19 Jan, 2023, 23:58 Ghansham Sangar, @.***> wrote:

If you can force it and recompile and regenerate the war and deploy and test it.

On Thu, 19 Jan, 2023, 21:27 Tara Drwenski, @.***> wrote:

Thank you for your responses!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

Is this configurable at the moment? I see that scanline strategy is indeed being used currently and is set from getOptimumDataReadingStrategy since the file is a netCDF file.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1397205559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJCVCP23QR43B6KGB3DWTFP6HANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

tdrwenski commented 1 year ago

Thanks for your time and for all the ideas!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

We hard coded the bounding box strategy like you suggested and tested but unfortunately it did not seem to affect performance.

One more point that I have observed is regarding this: Unidata/netcdf-java#1027 This issue may also play spoil sport which there is ncj-5.x but not in 4.x.

We are currently working on a fix that might help with this issue!

One more thought, is it possible that the caching could be improved somewhere? Not sure if this is the most relevant cache for performance here, but as I mentioned, a small change in the requested bounding box causes a cache miss in the domainMapperCache.

ghansham commented 1 year ago

Is it possible to share the ncml of the dataset? Is the dataset compressed?

Regards Ghansham

On Fri, 20 Jan, 2023, 22:22 Tara Drwenski, @.***> wrote:

Thanks for your time and for all the ideas!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

We hard coded the bounding box strategy like you suggested and tested but unfortunately it did not seem to affect performance.

One more point that I have observed is regarding this: Unidata/netcdf-java#1027 https://github.com/Unidata/netcdf-java/issues/1027 This issue may also play spoil sport which there is ncj-5.x but not in 4.x. … <#m-2187808429865766995>

We are currently working on a fix that might help with this issue!

One more thought, is it possible that the caching could be improved somewhere? Not sure if this is the most relevant cache for performance here, but as I mentioned, a small change in the requested bounding box causes a cache miss in the domainMapperCache.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1398659635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJGI3VIV4UE2FQGFXT3WTK7EXANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

ghansham commented 1 year ago

Have you checked the time spent in DataReadingStrategy class for boundingbox and scanline?

What is the machine configuration you are using?

FYI I used jdk8 and I saw improvement in performance. What is the dataset size?

On Fri, 20 Jan, 2023, 23:21 Ghansham Sangar, @.***> wrote:

Is it possible to share the ncml of the dataset? Is the dataset compressed?

Regards Ghansham

On Fri, 20 Jan, 2023, 22:22 Tara Drwenski, @.***> wrote:

Thanks for your time and for all the ideas!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

We hard coded the bounding box strategy like you suggested and tested but unfortunately it did not seem to affect performance.

One more point that I have observed is regarding this: Unidata/netcdf-java#1027 https://github.com/Unidata/netcdf-java/issues/1027 This issue may also play spoil sport which there is ncj-5.x but not in 4.x. … <#m_375583473962294108m-2187808429865766995_>

We are currently working on a fix that might help with this issue!

One more thought, is it possible that the caching could be improved somewhere? Not sure if this is the most relevant cache for performance here, but as I mentioned, a small change in the requested bounding box causes a cache miss in the domainMapperCache.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1398659635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJGI3VIV4UE2FQGFXT3WTK7EXANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

ghansham commented 1 year ago

If the variables are using add_offset and scale_factors, then netcdf -java issue will play a major role. The solution lies in checking if the value is missing, don't apply scale factor, simply set in to Nan as scale offset application is a costly operation. FYI, Ncj-5 upgrades uint16 to int32 and similarly other unsigned data types.

On Fri, 20 Jan, 2023, 23:26 Ghansham Sangar, @.***> wrote:

Have you checked the time spent in DataReadingStrategy class for boundingbox and scanline?

What is the machine configuration you are using?

FYI I used jdk8 and I saw improvement in performance. What is the dataset size?

On Fri, 20 Jan, 2023, 23:21 Ghansham Sangar, @.***> wrote:

Is it possible to share the ncml of the dataset? Is the dataset compressed?

Regards Ghansham

On Fri, 20 Jan, 2023, 22:22 Tara Drwenski, @.***> wrote:

Thanks for your time and for all the ideas!

If you have a system with reasonable memory, you can try to switch reading strategy from scanline to bounding box. it should reduce reading latency.

We hard coded the bounding box strategy like you suggested and tested but unfortunately it did not seem to affect performance.

One more point that I have observed is regarding this: Unidata/netcdf-java#1027 https://github.com/Unidata/netcdf-java/issues/1027 This issue may also play spoil sport which there is ncj-5.x but not in 4.x. … <#m_1765894194212342359_m_375583473962294108m-2187808429865766995_>

We are currently working on a fix that might help with this issue!

One more thought, is it possible that the caching could be improved somewhere? Not sure if this is the most relevant cache for performance here, but as I mentioned, a small change in the requested bounding box causes a cache miss in the domainMapperCache.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1398659635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJGI3VIV4UE2FQGFXT3WTK7EXANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

tdrwenski commented 1 year ago

The dataset we have been testing on is available here. It is a 477 MB netcdf-4 file. There is no scale/offset but there is a missing value/ fill value. Here is an example of a variable:

float tasmin(time, lat, lon) ;
    tasmin:missing_value = -999.f ;
    tasmin:_FillValue = -999.f ;
    string tasmin:long_name = "Max temperature" ;
    string tasmin:units = "deg_C" ;

Unfortunately, I am not seeing the same performance issues locally though I have tried to use the same TDS and WMS configuration as the user reports, so perhaps the hardware could be important. So regarding the bounding box vs scanline performance, I have not done any detailed time measurements as I am not able to reproduce it locally. I gave a build using the bounding box to the user to test and they saw approximately the same timings for WMS requests as before. They are using Ubuntu 20.04 and OpenJDK 11.

As I mentioned, my colleague is working on a fix that may improve performance of enhancements (scale, offset, missing value, and fill value), so maybe that will help in this case as well. I am not sure if I understand how that explains the performance being fine with one CRS but not with another though.

Here are examples of two TDS requests we are comparing the performance between:

(1) https://regclim.ceoas.oregonstate.edu:8443/thredds/wms/nccv2_maca_maps/rcp45/2025-2049_vs_1981-2010/BNU-ESM_rcp45_2025-2049_vs_1981-2010_anomalies.nc?width=1135&height=600&format=image%2Fpng&request=GetMap&service=WMS&styles=raster%2Fpsu-viridis&transparent=true&version=1.1.1&layers=tasmin&time=2001-01-03T00:00:00.000Z&colorscalerange=0.6227%2C4.694&numcolorbands=20&belowmincolor=extend&abovemaxcolor=extend&SRS=EPSG%3A4326&BBOX=-167,12,-12,57

(2) https://regclim.ceoas.oregonstate.edu:8443/thredds/wms/nccv2_maca_maps/rcp45/2025-2049_vs_1981-2010/BNU-ESM_rcp45_2025-2049_vs_1981-2010_anomalies.nc?width=1135&height=600&format=image%2Fpng&request=GetMap&service=WMS&styles=raster%2Fpsu-viridis&transparent=true&version=1.1.1&layers=tasmin&time=2001-01-03T00:00:00.000Z&colorscalerange=0.6227%2C4.694&numcolorbands=20&belowmincolor=extend&abovemaxcolor=extend&SRS=EPSG%3A3857&bbox=-10577331%2C2159574%2C-5555801%2C5113765

The only difference between (1) and (2) are the EPSG codes and the bounding boxes. Both responses are around 20 kB. (1) Takes ~0.5 seconds and (2) takes ~15 seconds. Repeat requests are faster because of the cache, but any change in the bounding box causes slow performance again.

Thanks again for your time and let me know if you would like more information.

tdrwenski commented 1 year ago

Good news-- we have discovered the performance issue, so I will close this.

The apache sis library that edal-java uses has some computationally intensive asserts that get hit a lot with certain WMS requests. The solution is simply to remove the JVM "-ea" option (enable asserts), which should have never been used in production anyways.

ghansham commented 1 year ago

I am using tomcat 8.5. Do you mean to say the Catalina.sh should not have "-ea" option?

Regards Ghansham

On Fri, 2 Jun, 2023, 20:14 Tara Drwenski, @.***> wrote:

Good news-- we have discovered the performance issue, so I will close this.

The apache sis library that edal-java uses has some computationally intensive asserts that get hit a lot with certain WMS requests. The solution is simply to remove the JVM "-ea" option (enable asserts), which should have never been used in production anyways.

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1573854756, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJDJKQ6RPEYISDHNGQDXJH36HANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

tdrwenski commented 1 year ago

Yes-- we removed the -ea from the JAVA_OPTS in our $TOMCAT_HOME/bin/setenv.sh (still shown in the TDS docs here but we are about to update that).

ghansham commented 1 year ago

In one of your mail, you mentioned that changing the CRS changes the TAT from 0.5 to 15 seconds. What are the figures after the update?

On Fri, 2 Jun, 2023, 23:00 Tara Drwenski, @.***> wrote:

Yes-- we removed the -ea from the JAVA_OPTS in our $TOMCAT_HOME/bin/setenv.sh (still shown in the TDS docs here https://docs.unidata.ucar.edu/tds/5.4/userguide/running_tomcat.html but we are about to update that).

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1574079714, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJB5FSEWEG34HYWHGRDXJIPJVANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>

tdrwenski commented 1 year ago

Hi, I believe for a request with a different CRS than the data is stored in-- after -ea is removed its about ~2 seconds on a cache miss (new bounding box requested) and ~1.5 second on a cache hit (same bounding box requested a second time) on our thredds dev server (for instance this query). With the -ea option it was ~15-20 seconds on a cache miss.

ghansham commented 1 year ago

Too good. It's a curvilinear dataset.

On Mon, 5 Jun, 2023, 19:46 Tara Drwenski, @.***> wrote:

Hi, I believe for a request with a different CRS than the data is stored in-- after -ea is removed its about ~2 seconds on a cache miss (new bounding box requested) and ~1.5 second on a cache hit (same bounding box requested a second time) on our thredds dev server (for instance this query https://thredds-dev.unidata.ucar.edu/thredds/wms/satellite/goes/east/grb/ABI/CONUS/Channel01/20230530/OR_ABI-L1b-RadC-M6C01_G16_s20231500001172_e20231500003545_c20231500003580.nc?TRANSPARENT=TRUE&STYLES=default-scalar%2Fx-Rainbow&LAYERS=Rad&COLORSCALERANGE=-50%2C50&NUMCOLORBANDS=20&ABOVEMAXCOLOR=extend&BELOWMINCOLOR=extend&BGCOLOR=extend&LOGSCALE=false&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&SRS=EPSG%3A4326&BBOX=-180,-90,160,81&WIDTH=1135&HEIGHT=600&FORMAT=image%2Fpng )

— Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/150#issuecomment-1576893122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYXFJH2YW3G5VASI4CR23LXJXS23ANCNFSM6AAAAAAT7MGXIQ . You are receiving this because you commented.Message ID: @.***>