Closed PeterWarren closed 7 years ago
This looks good. I've modified it slightly so that it uses the DataType size (from your code), and compares against a multiple of the maximum available memory. This will make it to the next release of ncWMS2, and should hopefully be in a subsequent TDS release.
Thank you Guy. When testing this I also noticed there are 2 very minor numeric overflow bugs in DerivedStaggeredGrid.size() and RectilinearGridImpl.size(). One or both integers used need to be cast to longs before multiplying. eg. return (long) xAxis.size() * (long) yAxis.size();
Great, thanks, I've fixed that one too.
So after some testing, it turns out that this is having a very detrimental effect on displaying data from large datasets - SCANLINE is a lot slower for compressed data, and this change is picking SCANLINE for datasets which really don't need it.
I've changed the code so that only the size of the horizontal grid is taken into account. That's all that DataReadingStrategy applies to anyway, so this should give a more realistic estimate of the amount of data which needs to be read, and should only choose SCANLINE in cases where it's really necessary to avoid OutOfMemoryExceptions. Once I've confirmed that it's all working properly, would you mind testing with your dataset to make sure that SCANLINE is still chosen?
Hi Guy - do you have a compiled ncwms jar containing your change that will work with thredds 4.6? If so be really interested in testing it, we see similar issues on a TDS which uses the scan line reading modification. I'm not a Java developer, so grabbing a compiled jar is easiest - otherwise I'll try to compile one. Thanks!
On 30 Sep 2016 21:18, "Guy Griffiths" notifications@github.com wrote:
So after some testing, it turns out that this is having a very detrimental effect on displaying data from large datasets - SCANLINE is a lot slower for compressed data, and this change is picking SCANLINE for datasets which really don't need it.
I've changed the code so that only the size of the horizontal grid is taken into account. That's all that DataReadingStrategy applies to anyway, so this should give a more realistic estimate of the amount of data which needs to be read, and should only choose SCANLINE in cases where it's really necessary to avoid OutOfMemoryExceptions. Once I've confirmed that it's all working properly, would you mind testing with your dataset to make sure that SCANLINE is still chosen?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Reading-eScience-Centre/edal-java/issues/69#issuecomment-250720329, or mute the thread https://github.com/notifications/unsubscribe-auth/AMel1mmJe1gzz_BHV6Gm4O-u5TfhTVWdks5qvPAGgaJpZM4KAInp .
Thanks again Guy. I'll backport your patch into 4.6 for Adam and test the current master branch.
just catching up here - what's the best way to go about testing this patch - is it part of any edal-java release yet (wondering if i should leap ahead to TDS 5 at this point)? and/or where can I grab a compiled ncwms.jar file containing the patch for TDS 4.x? Thanks
@adamsteer - Yes, this will have made it into any recent edal-java release, and so should be available in the latest TDS 5 builds. @PeterWarren would be better placed to tell you whether this is in any 4.x version of TDS
We are using thredds (ncwms currently but soon to be edal-java) to render wms layers of large (64GB) NetCDF-4 files. To avoid hitting out of memory errors we need to ensure the netcdf reading strategy is set to SCANNLINE. Currently, the reading strategy chooser (getOptimumDataReadingStrategy) only selects SCANNLINE if the file type is "netCDF" or "HDF4". Our files are "NetCDF-4" so the chooser falls-back to BOUNDING_BOX reading strategy and thredds quickly exhausts even very large memory allocations.
To avoid this we have patched our ncwms (thredds 4.6) to look for "NetCDF-4" type files and force them into SCANNLINE mode. We would now like to find a more permanent solution for thredds 5.0 and onwards.
I have 2 proposed solutions:
(1) is trivial so I wont provide any code for it. I had a go at implementing (2) (attached bellow). I assumed that all NetcdfDatasets could be considered gridded datasets, I am not sure if that's safe? And I calculated the size of the dataset by taking the product of all dimensions.
Please let me know what you think. NetCDF-4ReadingStratPatch.zip
Thanks