GeoscienceAustralia / geophys_utils

A collection of utilities for discovering, accessing and using geophysics data via web services or from netCDF files
Apache License 2.0
23 stars 9 forks source link

speed of point data access #8

Open markjessell opened 3 years ago

markjessell commented 3 years ago

Hi I have a notebook where I can draw a line across the geology map and it extracts the grav and mag profiles. It all works, which is amazing, hower it is quite slow in the data extraction step.

For the following code:

sample_values = np.array(netcdf_grid_utils.get_value_at_coords(sample_points, 'EPSG:4326', max_bytes=100))

it takes 16 seconds to extract 200 grav points, around 90 seconds to extract 200mag points, and nearly 5 minutes to extract 2000 mag points from these two sources:

"https://dapds00.nci.org.au//thredds/dodsC/iv65/Geoscience_Australia_Geophysics_Reference_Data_Collection/national_geophysical_compilations/Gravmap2019/Gravmap2019-grid-grv_cscba_1vd.nc"

"https://dapds00.nci.org.au/thredds/dodsC/iv65/Geoscience_Australia_Geophysics_Reference_Data_Collection/national_geophysical_compilations/Magmap2019/Magmap2019-grid-tmi-Cellsize40m-AWAGS_MAG_2019.nc"

So my question is whether there is a better (faster) way of extracting this information.

cheers

Mark

RichardScottOZ commented 3 years ago

Could try this with cloud optimised geotiff versions of these, Mark. Considering this currently.

RichardScottOZ commented 3 years ago

Or something like you would with Zarr :- https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685

markjessell commented 3 years ago

It seems to be a specific problem with sampling points, if I simply access the grid of the bounding box defined by the line, it is completed in a few seconds, so I will download the grid and sample it myself

richardt94 commented 2 years ago

Hi Mark,

I realise you posted this issue quite some time ago now but I've had a go at making this faster. If you're still interested in speeding up this type of data access, please see the pull request above and let me know if it helps. You will likely need to set the max_bytes argument in your function call to be larger (I had good results with 50 MB) to take advantage of the improvements.

RichardScottOZ commented 1 year ago

@richardt94 any idea of the limitations of size/area/time span for this type of approach?