Open billyz313 opened 1 year ago
Hello, we had a user encounter a similar issue a few months ago and after some investigation, we found that they were able to improve the performance by removing "-ea" from the JAVA_OPTS in their tomcat setenv.sh. Can you test that to see if it helps improve the performance issues that you're seeing?
@mnlerman Hi Megan, thank you for the suggestion. I just tried that and it didn't seem to make any difference. Here are the options I have set, maybe there is something else that may need adjusting and i'm just missing it?
NORMAL="-Xmx16384m -Xms512m -server -XX:+UseParallelGC -Djava.awt.headless=true"
HEAP_DUMP="-XX:+HeapDumpOnOutOfMemoryError"
HEADLESS="-Djava.awt.headless=true"
JAVA_OPTS="$CONTENT_ROOT $NORMAL $HEAP_DUMP $HEADLESS $JAVA_PREFS_ROOTS"
I do realize I have headless set twice. I was having an issue with thredds crashing when I would zoom the layer on the map and I saw adding headless might help. I added it in the NORMAL variable and it stopped crashing. Not long after that I realized it was actually already there in the HEADLESS variable. Not sure how it stopped it from crashing or if it was just a coincidence.
Still looking for suggestions. We're concerned that support for V4.6 has ended and we can't get V5 to produce anywhere near the same speed. At this point we're being forced to look into other options which I would prefer not to have to do. Any help would be greatly appreciated.
@billyz313 have you tested performance with the latest snapshot release of the TDS?
@haileyajohnson No, we are using 5.4, maybe I can convince them to give it a shot. I should just be able to replace the .war file and all the configs and custom palettes will just be read in with no issue?
@haileyajohnson I upgraded to V5.5 this morning to see if it helped. Unfortunately there seems to be no improvement in performance. It's a huge lag from V4.6 when using the wms endpoint which is what we use to animate the data on the map.
Thanks for testing that out.
Can you send a bit more info to help us try to reproduce your issue? If possible can you give us:
@tdrwenski Thank you for taking the time to try to assist. I'll start with a live example of the issue. I have one production layer pointing at thredds v5.5. All of the rest point to the 4.6 version. The application is located
https://climateserv.servirglobal.net/map
To see the issue you will need to:
1) Click the white layer stack icon at the top left in the green and white panel.
2) Click in the "filter layers..."
3) Type ndvi
4) Check the USGS eMODIS NDVI East Africa box to turn it on. (East Africa is the only one pointing to V5.5)
5) Initial load is slow compared to v4.6 but we could live probably with that if that was all it was supposed to do.
6) Click the play button on the time dimension control at the bottom of the map to animate the layer. Notice that it takes forever to load the steps to animate.
7) Click the pause button (or the % loading if it's still loading)
8) Uncheck East Africa. Check Central Asia (it's about the same size)
9) Click the play button. Notice there is a much shorter lag for the initial load of the animation, and it continues to animate with no issue just as it should.
So, now that you can see the issue I will try to get information on how to produce the backend to test. Let me start off by saying the performance issue effects every dataset.
I am attaching the thredds config files usr_local_thredds.zip. If you need more than this please let me know.
When we use the wms, we use the virtual aggregation which is located https://csthredds.servirglobal.net/thredds/wms/Agg/emodis-ndvi_eastafrica_250m_10dy.nc4?service=WMS&version=1.3.0&request=GetCapabilities (V4.6) https://threddsx.servirglobal.net/thredds/wms/Agg/emodis-ndvi_eastafrica_250m_10dy.nc4?service=WMS&version=1.3.0&request=GetCapabilities (V5.5)
About getting the data. I'm not sure the best way to get the data to you, we have several TB. The thredds endpoint for the dataset we were just testing is https://csthredds.servirglobal.net/thredds/catalog/climateserv/emodis-ndvi/eastafrica/250m/10dy/catalog.html (V4.6) https://threddsx.servirglobal.net/thredds/catalog/climateserv/all/emodis-ndvi/eastafrica/250m/10dy/catalog.html (V5.5) but it did try downloading and the download is just a touch smaller file size which means something is not exactly the same (not sure if that makes a difference.)
We have a Generalized ETL that we could help you setup
to download the data to your test system which would have it exactly as we have it in thredds.
Another option is that I could drop a handful of nc files in a google drive that you could grab.
I also just exposed the data directly from our server for you if that's easier. https://eandvi.servirglobal.net
Let me know what you would prefer.
Thank you for sending the extra info! I see how slow it is on your server and I am seeing similar performance issues locally using your datasets.
It seems slow with other services like NCSS as well so I don't think it's only WMS that's slow. It looks like this is related to a performance issue we are working on fixing for version 5.5-- Enhancements (such as scale, offset, fill value) are not handled well in the current version and can cause performance issues for large datasets.
We will keep you updated on our progress on this!
Thank you. Just a quick question if you know. Will disabling everything except wms help performance at all? I'm guessing it wouldn't, but we only actively use the wms for ClimateSERV so I think turning the rest of the options off is prolly a good idea just in general.
Will disabling everything except wms help performance at all? I'm guessing it wouldn't, but we only actively use the wms for ClimateSERV so I think turning the rest of the options off is prolly a good idea just in general.
Unfortunately, I don't think that will help at all. The only thing I think would help is to not have enhancements (scale, offset, fill value) in your data, which probably isn't a feasible workaround. I hope we will have a fix for you soon!
Ahhh, I see what u mean. The stuff in the actual data. I'll have to look into that, we use that for the calculations we do, but I wonder if it could be removed for the thredds data.
@tdrwenski So when we're creating the NetCDF it's this encoding that needs to be changed?
ds[self.etl_parent_pipeline_instance.dataset.dataset_nc4_variable_name].encoding = {
'_FillValue': np.int8(127),
'missing_value': np.int8(127),
'dtype': np.dtype('int8'),
'scale_factor': 0.01,
'add_offset': 0.0,
'chunksizes': (1, 256, 256)
}
just remove _FillValue, scale_factor, and add_offset?
I don't think there is any easy way to turn them off unless you change the data itself. For instance, with ncdump on one of your data files I see:
variables:
double latitude(latitude) ;
latitude:_FillValue = NaN ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
int time(time) ;
time:long_name = "time" ;
time:axis = "T" ;
time:bounds = "time_bnds" ;
time:units = "seconds since 1970-01-01T00:00:00+00:00" ;
time:calendar = "proleptic_gregorian" ;
double longitude(longitude) ;
longitude:_FillValue = NaN ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
byte ndvi(time, latitude, longitude) ;
ndvi:_FillValue = 127b ;
ndvi:long_name = "ndvi" ;
ndvi:units = "unitless" ;
ndvi:comment = "Maximum value composite over dekad defined by time_bnds" ;
ndvi:add_offset = 0. ;
ndvi:scale_factor = 0.01 ;
ndvi:missing_value = 127b ;
int time_bnds(time, nbnds) ;
time_bnds:long_name = "time_bounds" ;
The attributes like add_offset
, scale_factor
, missing_value
, and _FillValue
are the enhancements I am referring to. If you are creating the netcdf files yourself you could test if things are faster without those attributes. Otherwise you can use tools like ncgen/ncdump to test removing them, but I guess that would not be an actual workaround but only useful for testing.
Hi @tdrwenski , I was just checking back to see if y'all were able to publish a fix for this yet. Also I did look into removing the enhancements, but it will not work in the rest of the system without these.
Hi, we are still working on this performance issue. We will try to let you know if we have an update!
Hi @billyz313 , the latest snapshot of the TDS includes some performance improvements for datasets with enhancements that may help with your issues, though there may still be other problems causing slowdowns. If you get a chance to check it out, we'd be interested to hear if it helps at all.
@haileyajohnson Thank you! I will see if we can get it deployed asap! Is it the 5.5 from https://downloads.unidata.ucar.edu/tds/ ?
@haileyajohnson We upgraded to 5.5 and unfortunately there is no change in the performance.
yes, we too are facing the same... 4.6 was a lot faster than 5.5
We have a few more performance fixes that I think may help you. You can download the latest snapshot here: https://downloads.unidata.ucar.edu/tds/5.5/thredds-5.5-SNAPSHOT.war
Let us know if that helps or not!
Hi @billyz313, have you had a chance to test performance with the latest snapshot yet?
@tdrwenski yes, we installed the new 5.5 snapshot you mentioned and didn't seem to have any effect on the performance. Some of the folks thought it got slower, but i think it's about the same. So, we're still running 4.6 in production...
@billyz313 we used your data as a benchmark for our performance improvements, so it's surprising that it hasn't fixed your issues (not that we're doubting you, we can see that it's slow). In our own tests, performance on serving your data vis WMS was at least twice as fast with recent changes... Is it possible that your network could be blocking/scanning/slowing something down? There are definitely things we could have overlooked here, but it be good to verify that it's not a network config issue.
@haileyajohnson Do you think the virtual aggregation could be causing the delay?
Very large joinExisting aggregations can be slow on the first request, as all files in the aggregation will be opened to get the coordinate info. We have an aggregation cache that persists this info, so it should be much faster on the second request. I don't believe I saw this behavior on your server, however, it seems consistently slow.
You may want to compare the performance of a single unaggregated file vs an aggregation if you want to be sure it's not the aggregation causing the slow down. If do you think your performance issues are related to the aggregation, there are a couple things you could try. The aggregation cache is scoured daily, but you can turn this off by setting the scour period to -1 sec
(see here). If each file in your joinExisting aggregation has one time value, you can also try using a dateFormatMark to extract the value from the file name, so that the file won't need to be opened to get this info.
@tdrwenski We're considering switching our files from netcdf to zarr. The 5.x version supports zarr right? And is the wms tiling a lot faster thru THREDDS using a zarr file or is it similar tiling speed as netcdf?
@billyz313 unfortunately Zarr is still very in beta in the TDS library, but you're welcome to try it out
@haileyajohnson thank you, I think we should throw a few files in, cross our fingers, and see what it does :)
We are still running V4.6 for production because 5.4 is many times slower. We want to move production to 5.4 but simply can't.
We are using Ubuntu 22.04.2 LTS, OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode), python 3.10
To reproduce I add a wms layer to a leaflet map for v4.6 https://csthredds.servirglobal.net/thredds/climateserv_aggregated.html?dataset=emodis-ndvi_eastafrica_250m_10dy and on a different leaflet map i add v5.4 https://threddsx.servirglobal.net/thredds/catalog/climateserv_aggregated.html?dataset=emodis-ndvi_eastafrica_250m_10dy
I also have them added as L.timeDimension.layer.wms because we animate the layers. This is where it really makes a lot of difference. Animation using THREDDS 5.4 currently seems not possible due to the lag of the responses.
Any help and suggestions are welcome.