Open mpiannucci opened 2 years ago
How about caching the weights on first access for a level?
How about caching the weights on first access for a level?
Yeah I think that’s a good compromise. I am worried about when multiple tiles from a single level are requested concurrently, so prob need a mutex in there to avoid clashing, might try that tonight
So in trying this last night, xESMF is very slow to generate weights for the regridder once you get past zoom level one. It wasn't as bad when originally regridding to only the tile extents.
Need to rethink the best way to manage this, ncwms seems to be able to do this very fast so will use that as inspiration
Is it actually delaying calculation? It might need to have an explicitly instantiated dask cluster in scope before it will defer calculation.
Not sure. I have another idea using just reprojection i am going to test
I think @abkfenris is correct -- without a dask scheduler set up the calculation will not be lazy.
We can get away with just reprojecting with rioxarray and its a lot faster than regridding with xesmf
^^ but that doesn't solve longer term issues with lazy loading because rioxarray needs the whole dataset to reprojrct at the moment.
So good enough to get data to the zarr pyramid endpoints as a POC but not the long term solution for now
For checking if there are any dask clusters that have been implicitly created, distributed.client._global_client_index
should give a dictionary of any dask clients currently in scope. via https://github.com/pangeo-forge/pangeo-forge-recipes/pull/350/files#r861207572
So in testing, the bottleneck for using xESMF is generating the weights, which can not be done in parallel using dask or otherwise when using xESMF. The actual regridding is fast enough once the weights are generated
Are grid weights (from model grid to some “common” coordinate systems) something that should be precomputed and stored adjacent to the underlying dataset? Is that possible?
On Sat, Apr 30, 2022 at 10:26 AM Matthew Iannucci @.***> wrote:
So in testing, the bottleneck for using xESMF is generating the weights, which can not be done in parallel using dask or otherwise when using xESMF. The actual regridding is fast enough once the weights are generated
— Reply to this email directly, view it on GitHub https://github.com/asascience/restful-grids/issues/17#issuecomment-1113997629, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPFFO3N5GJZLWXQNRHYOQDVHU7KFANCNFSM5UTIR24Q . You are receiving this because you commented.Message ID: @.***>
-- JAMES MUNROE | ASSOCIATE PROFESSOR
Department of Physics and Physical Oceanography Memorial University of Newfoundland 230 Elizabeth Avenue St. John’s, Newfoundland, Canada A1C 5S7 Chemistry and Physics Building | Room C 4060 T 709 864 7362 | M 709 771 0450
www.physics.mun.ca
While we could pre-compute weights ourselves, it would be nice to not have to define a single specific sidecar file. I'd still lean towards adding them to the cache.
How about adding a datasets/{dataset_id}/tree/cache
route, that then fires off a background task to build up the weights and cache them? And we could either try to generate on the fly or fire off an error with a message to hit that endpoint if necessary if the weights aren't cached.
Also on the cache side of things we might want to explore overriding the current xpublish.get_cache()
. The current cache store is a dict, so it's confined to a single process (also we might want to explore running it in gunicorn to enable multiple accesses), but cachey.Cache
's data store is pluggable. We could try something like redis_collections.Dict
or other MutableMapping
s that could be swapped in place.
The /image/tile api uses xESMF for regridding on the fly. This means that the weights are recomputed every single time the /tile api is called which is absolutely terrible across the board.
Initial idea is to precompute the weights for every level, then only apply the grid, reproject and Clip the tile. Should prob just cache the regridded values but trying to keep things idempotent as possible