digitalearthpacific / dep-coastlines

GNU General Public License v3.0
3 stars 1 forks source link

Alex messing around #11

Closed alexgleith closed 1 year ago

alexgleith commented 1 year ago

Hey Jesse

Sorry that this is a little opinionated. I'm just learning the repo and got diving into a low priority area, but hopefully it's not wasted.

So this should be two PRs too, but in short:

  1. Removes the git submodules and replaces the dea-tools one with pip installed DEA Tools. And the other dep-tools repo is just installed using git in the Dockerfile. This simplifies the environment for this repo and still includes those projects.
  2. I refactored the AOI process. This was really just exploratory, but I think I sped it up a bit too.

There's a Makefile with commands in it as documentation and easy shorthand for running things.

Please feel free to push back on anything you don't agree with!

jessjaco commented 1 year ago

Hey Alex, thanks for all this.

I would like to move towards removing the submodules, but we may have to strategize a bit.

I added the dea-tools because when I started work on the coastlines there was active development on some of the functionality I was using (contours_preprocess I think) and the function signatures were in flux. As you can imagine it was causing seemingly random errors so I used it as a way to freeze the version. That and, as I think I have seen you note on the odc slack, the pip installed version is/was pretty out of date. So maybe we could install directly from github using the same hash and drop the submodule?

As I mentioned yesterday, much of this repo is still in active development and I had been changing a lot of dep-tools stuff at the same time I was working here. So the dep-tools submodule is more for ease of development sake. I'll give some thought to how much of that is going to continue, but maybe we can drop that too.

As for the AOI, I think you're on the right track but there are some things to note. First, @sopac had developed a separate aoi representation which is available here. I did a side-by-side comparison and as far as I could tell it matched up perfectly with what aoi.py produced, so I didn't consider it further. BUT, I think we should have an authoritative version - whether created by code here or not. Also I wanted the coastline repo to stand alone, but I (somewhat embarrassingly) should point out that I have the same aoi creation code in a couple of other places. It's in dep-tools and also in my fork of the "main" repo which I had been using when creating evi and wofs here. There's a longer story of the logic (or lack thereof) but the bottom line is we need a canonical version and a decision on where to put it. I welcome your input and further discussion on that! I think as a group we may just need to decide how some of these repositories function. Should we have a main repo with notebooks / demo code (say DigitalEarthPacific), something like dep-tools with "common" code, and then a repo for each product (e.g. coastlines or mangroves)? That may help guide where the AOI code goes. I know the dea folks are dealing with some of this for their notebooks repo right now.

Oh, and one other thing about the AOI - you may have noticed in places we read the aoi.gpkg / pathrows directly from blob storage (e.g. from https://deppcpublicstorage.blob.core.windows.net/output/aoi/aoi.gpkg), so it'd be great if any make, etc. workflow automated that upload / versioning. (As to why they are being read from az, I didn't want to store the data directly in the repo but moreover kbatch is finicky and can only deal with up to one small directory when running).

Finally, I should have made it clearer what create_grid.py is doing. It's not just selecting pathrows which intersect with coastline - it's intersecting a buffer of the coastline with the pathrows (see the output here). This sort of format is the primary basis for the way we've been scaling things via (somewhat long-winded but getting better) run_processor. Briefly, everything is processed a scene at a time, and the coastal buffer or land is there as a first clip of the data when reading (see e.g. this). This may be part of a longer discussion on specifics of scaling and I've gone on a bit long already, but happy to continue.

alexgleith commented 1 year ago

Hey @jessjaco

So maybe we could install directly from github using the same hash and drop the submodule?

Yeah, this is a good idea I think, for the Docker image. I'll reach out to DEA and see if they can cut a release, so there's a newer stable version of dea-tools published.

So the dep-tools submodule is more for ease of development sake.

My way of working for local dev would be to have it checked out separately and to install it with pip install -e so that I can change it if I need to. Then the Dockerfile uses a pure "release", so that it's reproducible. (The Docker Image should be built in CI too, somewhere... but that can come later.)

But I don't want to be opinionated about your dev environment though, so how about I close this PR, but leave the branch up.

we need a canonical version

For DE Africa, I created deafrica-extent and socialised with decision makers and techs, so we all agreed on the AOI. This was then used to extract a list of MGRS and PATHROW IDs, which we use as the master list to work off.

So I agree we should develop a canonical version. I'll create a backlog item... and a backlog to put it in so we can come back to it.

it's intersecting a buffer of the coastline with the pathrows

I see, that's different from what I understood.

The DE Africa Coastlines process uses a JSON file with the pathrows of interest in it, screenshot below. Maybe it would be good if you can walk me through where we're at next week one day? I'm quite interested to know how we're running it using kbatch, as I haven't used this before. We used Argo Workflows in DE Africa, which was fantastic.

image
jessjaco commented 1 year ago

I think we can do the submodule drop. The workflow you describe is essentially what I've been doing, and kbatch needs to pull directly from github when running anyway.

The docker image/file setup will be a little more tricky. I can explain when we meet but ultimately we may have to balance strict reproducibility with other considerations.

alexgleith commented 1 year ago

There are unstable releases of dea-tools, which can be installed easily: https://pypi.org/project/dea-tools/#history

I'll untangle this PR this week and break it into two.

On Fri, 30 Jun 2023 at 01:37, Jesse Anderson @.***> wrote:

I think we can do the submodule drop. The workflow you describe is essentially what I've been doing, and kbatch needs to pull directly from github when running anyway.

The docker image/file setup will be a little more tricky. I can explain when we meet but ultimately we may have to balance strict reproducibility with other considerations.

— Reply to this email directly, view it on GitHub https://github.com/digitalearthpacific/dep-coastlines/pull/11#issuecomment-1613426370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2JIXJUMU7IH3OTKFB62RLXNWOJ5ANCNFSM6AAAAAAZWUXQPA . You are receiving this because you authored the thread.Message ID: @.***>

-- Alex Leith m: 0419189050

jessjaco commented 1 year ago

Plz see my latest commits, I removed the submodules and fixed the dockerfile. Also see #12, I will describe fixes I made there when I get a chance