Open jrbourbeau opened 1 week ago
Are these benchmarks running in a stable cloud / region? It'd be nice to find a dataset in the same region if possible (to cut down on egress costs and speed up the I/O portion of the benchmark, which probably isn't relevant for dask).
(edit: I see the cluster_kwargs
answers that, nice).
On stackstac vs. odc-stac, the main things to be aware of are
groupby
stage to ensure that all of the pixels from the same time end up in the same pixel plane (where "same time" is configurable, so that a scene captured a few seconds later can be considered the same if you want).resolution=
)I'll give this workload a shot today or tomorrow and will report back.
Are these benchmarks running in a stable cloud / region?
Right now this is running in westeurope
on Azure, which should be where the underlying data is stored, but we can run in any region on AWS, GCP, or Azure.
I'll give this workload a shot today or tomorrow and will report back.
That'd be great. I'm happy to chat generally about this. Also, let me know if you need access to a Coiled workspace that's configured Azure.
Okay, so here's notebook (https://gist.github.com/jrbourbeau/900b602d19fe8087cafc0490b5c26f68) that runs the same computation using odc.stac
. Here's the specific odc.stac.load
call
resolution = 10
SHRINK = 4
resolution = resolution * SHRINK
ds = odc.stac.load(
items,
chunks={},
patch_url=planetary_computer.sign,
resolution=resolution,
crs="EPSG:3857",
groupby="solar_day",
)
where I use things like groupby="solar_day"
, which I saw used in a couple of examples I found. This seems to produce a much smaller graph and is more performant in general.
xref https://github.com/coiled/benchmarks/issues/1548