Open rhugonnet opened 1 year ago
Also, any advice on how I should set-up large-scale requests? (all glaciers worldwide) What max resources should I aim for by request for the best performance? (also added a point in #343 regarding the unit of max resources)
(That should give me guidance on how to split my requests in smaller bits! :smile:)
@rhugonnet The issue is the call to icesat2.init
resets the max resources back to the default. The reason for this is historical. When we first started and icesat2 was our only mission, the icesat2.init
function initialized all of the parameters that could be configured for the client. So if you look at the argument list for that function:
you can see the max_resources
parameter has a default setting that gets applied if it isn't provided. This is the only init function that behaves this way. Since then, the SWOT and GEDI init functions (along with any of the other missions we will add in the future), will not have this argument. We kept it this way to minimize the changes needed to people who had scripts that used this function.
So in your script above, if you flip the order in the lines:
# Set max resources
earthdata.set_max_resources(5000)
# Configure ICESat-2 API
icesat2.init("slideruleearth.io")
then that should take care of the problem. Alternatively, you could also add the max_resources to the icesat2.init
call.
# Configure ICESat-2 API
icesat2.init("slideruleearth.io", max_resources=5000)
As for using the shapefile - yes, in the code you have, the convex hull is being generated and used to subset to the area of interest. If you want to preserve the features of the shapefile, you need to use the "raster"
option in the request parameters.
# Specify region of interest from geojson
poly_fn = '/home/atom/data/inventory_products/RGI/00_rgi60_neighb_renamed/11_rgi60_CentralEurope/region_11_rgi60_CentralEurope.shp'
region = sliderule.toregion(poly_fn) # NOTE REMOVED ["poly"]
parms = {
"poly": region["poly"],
"raster": region["raster"], # ADD THIS LINE HERE
"srt": icesat2.SRT_LAND,
"cnf": icesat2.CNF_SURFACE_HIGH,
"ats": 20.0,
"cnt": 10,
"len": 200.0,
"res": 100.0,
"maxi": 1
}
This will send a geojson representation of the shapefile to the servers, where they will burn a raster of the geojson and use that raster as an inclusion mask. When making the call to sliderule.toregion(poly_fn)
, the function takes a parameter cellsize
which specifies the size of the pixel of the raster. It is in degrees and defaults to 0.01 degrees.
This functionality has not gotten a lot of use, and still needs some work. So please provide us feedback as you go on how we can make it better. One of the things we know about (and is on our short list to work on), is that using the shapefile this way (i.e. burning a raster for an inclusion mask) slows down the subsetting substantially. So please expect significantly longer runs. We have some ideas on how to make this faster, but haven't had the time to do it yet. But you using this functionality is good motivation to get on it.
Lastly, for large processing runs like this, you should definitely use one of the private clusters. I'd recommend using the uw
private cluster. The reason is two fold - 1. a request this large will consume all the resources of the public cluster and make it unavailable for others while working on the request. 2. with a private cluster we can scale up the number of nodes to a much higher number and provide a lot faster response to you.
If you haven't done so already, you can create an account on https://ps.slideruleearth.io to get started. Then here is a link to our write up on how to use a private cluster: https://slideruleearth.io/web/rtd/user_guide/Private-Clusters.html.
If you have any questions, please let me know.
Thanks a lot for all the info @jpswinski! :smiley:
Moving forward with this.
I'm adding this init
order + double method for max_resource
+ polygon
behaviour in #343 to remind ourselves to clarify in the docs for other users, if it isn't already there! (I'll probably miss where it is described in some cases!)
Also: to find "outdated examples" that fail during CI, we could activate doctests
for the SlideRule Python client :smile:
For example, by putting this line into a pyproject.toml
in clients/python/
, it will run those with pytest
:
[tool.pytest.ini_options]
addopts = "--doctest-modules"
testpaths = [
"tests",
"sliderule"
]
More generally, the pyproject.toml
file is also now recommended for setuptools
instead of setup.py
, as multiple tools definitions can be shared in there. See for example: https://www.reddit.com/r/learnpython/comments/yqq551/pyprojecttoml_setupcfg_setuppy_whats_the/. Usage example here: https://github.com/pypa/sampleproject/blob/main/pyproject.toml.
Most project keep their setup.py
/setup.cfg
for backwards-comp.
What do you think @jpswinski?
Hey @jpswinski,
I'm running the following code adapted from the first notebook:
which fails with the default max resources of 300 instead of the 5000 that I set:
Am I doing anything wrong?
The region is just a big shapefile with all glaciers polygons in the European Alps, which I guess gets converted into a convexhull of all dissolved features? (couldn't find info on this in the doc, adding to #343) Here it is to reproduce the behaviour on your side!