PermafrostDiscoveryGateway / pdg-portal

Design and mockup documents for the PDG portal
Apache License 2.0
0 stars 0 forks source link

Add the Pan-Arctic Catchment Database layer #29

Open robyngit opened 1 year ago

robyngit commented 1 year ago

Paper: The Pan-Arctic Catchment Database (ARCADE) (pre-print) Data: via dataverse - currently "under review" & not available to download DOI: https://doi.org/10.5194/essd-2022-269

robyngit commented 1 year ago

Note: Anna has contacted Niek and Gustaf to request access to the data

robyngit commented 1 year ago

@julietcohen is this one the same as https://github.com/PermafrostDiscoveryGateway/pdg-portal/issues/36?

julietcohen commented 1 year ago

@robyngit Yes, it is, oops! Let's retain this issue, and I'll close the other one. Now that the paper has been published and the datasets have been archived, here is more information (copied over from the duplicate issue):

julietcohen commented 1 year ago

Initial Dataset Exploration

See datateam: /home/jcohen/PDG_ARCADE_layer/data_explore.ipynb

ARCADE_v1_36_1km.shp

Plot:

image

ARCADE_v1_37_1km.shp

Plot:

image

So cool!

Using geopandas difference() reveals their geometry columns are the same:

diff1 = data_36['geometry'].difference(data_37['geometry'])
diff2 = data_37['geometry'].difference(data_36['geometry'])
# both output geoseries contain only 1 value: POLYGON EMPTY
robyngit commented 4 months ago

Niek Jesse Speetjens reached out today to ensure that we're aware that the paper (originally pre-print) is now published and the data are publicly available. He also mentioned that he's considering coming up with a second version of this dataset at some point.

julietcohen commented 4 months ago

@robyngit Thanks for responding to Niek's email and connecting us

I took another quick look at this dataset to see if I have any questions before I email Niek back.

julietcohen commented 4 months ago

I did find the metadata that describes the attributes, in excel files S1_ARCADE_v1_37_1km.xlsx and S1_ARCADE_v1_36_1km.xlsx that can be downloaded from the metadata page linked above. The metadata includes the label, unit, and a short description and all the attributes are separated into various categories that are tabs in the excel sheet. I like the clear way the ADC documents attributes better.

julietcohen commented 4 months ago

This dataset has the same issue as the Circum-Arctic permafrost and ground ice dataset: there are polygons that cross the antimeridian, and they become distorted in the exact same way (wrap the opposite way around the earth) when the data is transformed from its original CRS (in this case, it is EPSG:6931) into EPSG:4326 which is the CRS of the TMS of the viz workflow. I confirmed that when these polygons are removed before transforming, the remaining polygons are transformed smoothly. As part of my email response to Niek, I said:

So before we can visualize your data, I will do some preprocessing:

  • splitting all multipolygon geometries into singular geometries (this is easy with GeoPandas explode)
  • removing any rows with NaN values for attributes of interest
  • spit the polygons that intersect the antimeridian

These first two preprocessing steps are common with many datasets we visualize, so I'm working towards integrating these steps into our generalized workflow. The last step is trickier. I have been working on the code to effectively split the polygons at the antimeridian while the data is still in its original CRS to avoid distorting any of them in the CRS transformation. I will test my code on these polygons to see if that allows a clean transformation. Alternatively, removing these polygons altogether (instead of splitting them) allows all other polygons to be processed and visualized, but Anna expressed her preference is to retain all data from the input dataset before visualization. This makes sense to me as well! However, if you have a version of this dataset in another CRS, such as EPSG:4326, please let me know as that would be a simpler solution to getting your data on the portal.

In the other dataset issue, I did document code that I wrote to split the polygons at the antimeridian. But it did not work for that dataset. I can see if modifying that code will work on this dataset, and hopefully I can eventually integrate a generalized fix for this into viz-staging