PermafrostDiscoveryGateway / pdg-portal

Design and mockup documents for the PDG portal
Apache License 2.0
0 stars 0 forks source link

Add annual lake data summary statistics from Ingmar Nitze's lake dataset #86

Open julietcohen opened 3 months ago

julietcohen commented 3 months ago

The AI working group, Wenwen's team, is working on creating summary dataset(s) of Ingmar Nitze's lake data. Their first iteration focuses on the time series dataset, counting the number of lakes (polygon centroids) that fall within a 30x30 meter grid per year. Kat Matson, one of the Google fellows, produced a first version of this and passed on 2 geopackage files that have been uploaded to datateam at: /var/data/submission/pdg/nitze_lake_change/lakes_summary_AIgroup/. The two filenames describe the CRS of each file, which Kat said is the only difference between them.

Kat's description:

I've created files for the number of lakes in each 30x30 meter square in a google drive folder at https://drive.google.com/drive/folders/1gijGUoesd0VHN8Y09fBfUI2-WqXWWmO-?usp=sharing (it's in my Woodwell account's personal drive for now because this isn't how we'd permanently share it, but I can move it to a shared location if needed). Both of the files in there are geopackage files of a GeoDataFrame with a single 'num_lakes' column with the number of lakes with a centroid in the region. With how small the regions are, the vast majority would have had 0 lakes, so to save space I only added boxes that have at least one lake (and with how small they are, I'd be surprised if there's more than 1 lake in any of them). There are two versions of the same counts generated from the 2021 lake data, one with coordinates in EPSG:3995 (matching the lake data) and for the other I converted it to EPSG:4326. Let me know if there's something that looks wrong with the data or if you need anything else!

julietcohen commented 3 months ago

It will be interesting to compare how processing this data from Kat differs from inputting Ingmar's lake data into the workflow and running the custom stat defined as centroids_per_pixel in viz-staging. I could run both Kat's sample for 2021 and Ingmar's lake data for 2021 (one of the time series files I parsed from his original data based on year) and see the results.

julietcohen commented 3 months ago

The first draft of this dataset it up on the demo portal.

Zoomed out, this data is very informative! Zoomed in, the geoms are less informative, they seem to be a few grid cells that surround the centroid of the polyon (or perhaps a few randomly chosen adjacent grid cells in the lake polygon)

image

image