GeoscienceAustralia / dea-notebooks

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and Xarray
https://docs.dea.ga.gov.au/notebooks/
Apache License 2.0
439 stars 127 forks source link

Add threads flag to parallel apply #1171

Closed alexgleith closed 7 months ago

alexgleith commented 7 months ago

Proposed changes

The parallel_apply function uses a process pool to do work in parallel, which is not necessarily the best way to do parallel work. This PR keeps that as the default but also provides an option to us threading as well.

Closes issues (optional)

Checklist (replace [ ] with [x] to check off)

alexgleith commented 7 months ago

Added the last modified date.

I don't think this is used in Notebooks. It is used in Coastlines and other places.

From more testing, I'm not sure it makes a massive difference, but I think it's worth having as an option still.

robbibt commented 7 months ago

This is fantastic @alexgleith! As someone from a non-ICT background I don't really have as good an understanding of this stuff as I should - any chance you could add one additional sentence to the doc string explaining why/when using threads vs processes might be a better option? (just something not super technical to help beginner users make a more informed choice)

alexgleith commented 7 months ago

No worries, @robbibt. Done.

The longer explanation from my (still kind of lay person) view is that a process is a whole new operation, whereas a thread is running in the existing operation. Think of it as like spawning a new machine (process) to do some work, instead of doing another piece of work on the existing machine (thread).

A process is very much separated, and inter-process communication is hard, whereas threads share a single process, and so can share memory and communicate directly that way.

Being "thread safe" is important too, to stop tasks stomping on each other's memory... I think most of what this function is being used for will be thread safe. Most stuff in Python is thread safe, these days, but it does need consideration.

My $0.02!

alexgleith commented 7 months ago

Ok cool!

I can't merge it in. I have no power here 😆

No rush. Maybe @omad has an opinion still.