Open robyngit opened 2 years ago
Note: Anna has contacted Niek and Gustaf to request access to the data
@julietcohen is this one the same as https://github.com/PermafrostDiscoveryGateway/pdg-portal/issues/36?
@robyngit Yes, it is, oops! Let's retain this issue, and I'll close the other one. Now that the paper has been published and the datasets have been archived, here is more information (copied over from the duplicate issue):
See datateam: /home/jcohen/PDG_ARCADE_layer/data_explore.ipynb
geometry
column contains two geometry types: some are POLYGON
and some are MULTIPOLYGON
Plot:
geometry
column also contains two geometry typesPlot:
So cool!
Using geopandas difference()
reveals their geometry columns are the same:
diff1 = data_36['geometry'].difference(data_37['geometry'])
diff2 = data_37['geometry'].difference(data_36['geometry'])
# both output geoseries contain only 1 value: POLYGON EMPTY
Niek Jesse Speetjens reached out today to ensure that we're aware that the paper (originally pre-print) is now published and the data are publicly available. He also mentioned that he's considering coming up with a second version of this dataset at some point.
@robyngit Thanks for responding to Niek's email and connecting us
I took another quick look at this dataset to see if I have any questions before I email Niek back.
pf_frac
, area_km2
, iwp_frac
, soilt_23
, soilt_24
, terrain_0
). Unfortunately, I do not see any attribute descriptions in the metadata. I'll reach out to Niek to let him know I don't know where to find them.terrain_#
onespf_frac
or area_km2
since both seem to be relevant to the PDG and we know the units of both. pf_frac
is [0, 1], has 1 NA valuearea_km2
is [1.004, 3117158.519], has no NA valuesI did find the metadata that describes the attributes, in excel files S1_ARCADE_v1_37_1km.xlsx
and S1_ARCADE_v1_36_1km.xlsx
that can be downloaded from the metadata page linked above. The metadata includes the label
, unit
, and a short description
and all the attributes are separated into various categories that are tabs in the excel sheet. I like the clear way the ADC documents attributes better.
This dataset has the same issue as the Circum-Arctic permafrost and ground ice dataset: there are polygons that cross the antimeridian, and they become distorted in the exact same way (wrap the opposite way around the earth) when the data is transformed from its original CRS (in this case, it is EPSG:6931) into EPSG:4326 which is the CRS of the TMS of the viz workflow. I confirmed that when these polygons are removed before transforming, the remaining polygons are transformed smoothly. As part of my email response to Niek, I said:
So before we can visualize your data, I will do some preprocessing:
- splitting all multipolygon geometries into singular geometries (this is easy with GeoPandas explode)
- removing any rows with NaN values for attributes of interest
- spit the polygons that intersect the antimeridian
These first two preprocessing steps are common with many datasets we visualize, so I'm working towards integrating these steps into our generalized workflow. The last step is trickier. I have been working on the code to effectively split the polygons at the antimeridian while the data is still in its original CRS to avoid distorting any of them in the CRS transformation. I will test my code on these polygons to see if that allows a clean transformation. Alternatively, removing these polygons altogether (instead of splitting them) allows all other polygons to be processed and visualized, but Anna expressed her preference is to retain all data from the input dataset before visualization. This makes sense to me as well! However, if you have a version of this dataset in another CRS, such as EPSG:4326, please let me know as that would be a simpler solution to getting your data on the portal.
In the other dataset issue, I did document code that I wrote to split the polygons at the antimeridian. But it did not work for that dataset. I can see if modifying that code will work on this dataset, and hopefully I can eventually integrate a generalized fix for this into viz-staging
Paper: The Pan-Arctic Catchment Database (ARCADE) (pre-print) Data: via dataverse - currently "under review" & not available to download DOI: https://doi.org/10.5194/essd-2022-269