Open robyngit opened 1 year ago
This is a great idea. I'm curious how that relates to the two other general permafrost layers we have --
It seems like we could also show the Obo layers (soil temp and permafrost probability) on our imagery viewer.
I agree that this layer looks like it would be very helpful for users to understand the IWP layer! The overall data directory consists of 3 shapefiles:
permaice.shp
- 21 MB, 13,671 rows
NUM_CODE
- 0 NA
valuesCOMBO
- 0 NA
valuesRELICT
- 13658 NA
valuesEXTENT
- 6480 NA
valuesCONTENT
- 6480 NA
valuesLANDFORM
- 6639 NA
valuessubsea.shp
- 18 KBtreeline.shp
- 16 KB
Plus a few other small .byte
and .hdr
files.Notes:
permaice.shp
subsea.shp
:treeline.shp
:Update:
Started staging the file permaice.shp
yesterday with:
Staging has been ongoing on Datateam for almost a full day and is at >604,000 staged files. I will continue to let it run while I take care of other higher priority tasks.
Update:
permaice.shp
with 1,653,964 gpkg files and the rasterization step has started.I set up a parsl job on Datateam to execute rasterization and web-tiling in parallel, since rasterization was going very slowly without parallelization. After running over the weekend, rasters were produced for the staged tiles for z-11 (the highest z-level I set in the config), but a parsl error occurred during rasterization for z-level 10: parsl.executors.high_throughput.errors.WorkerLost: Task failure due to loss of worker 5 on host datateam
I started a new parsl job to pick up where the workflow left off: restarting z-10, then the lower z-levels, then web-tiling.
Visualized the web-tiles for just z-11 and z-10 (since those are the only z-levels that I've been able to produce on Datateam so far without running into an OOM error, even with parallelization) using "coverage" for the statistic since this is just a first pass. I think it makes sense to instead use an attribute of the data, such as "EXTENT". The documentation explains what the codes for this attribute mean:
47% of the observations have nan
for this attribute. These would have to be removed before staging for the rasterization step, which cannot gracefully handle nan
values yet.
Could also use the attribute "RELICT" which only contains "yes" or nan
. 99% of those observations are nan
so probably not a good path forward. If we assume (or find in the documentation) that nan
means "no" for this attribute, we could use it instead of removing all those observations, but it seems like it would result in a very uniform and uninteresting palette.
nan
for the "extent" attribute
coverage_category
with a pre-set range of values for that attribute: [1,4]
Over the weekend, the script staged and rasterized all tiles for all z-levels with no OOM errors. However, the web-tiling failed with: cannot convert float NaN to integer.
for every raster.
Debugging revealed that the issue was within viz-raster/pdgraster/WebImage.py
, when we convert a raster to a Python Imaging Library image. Within to_image()
, we create a no_data_mask
that represents the values within the image data that have no data (0 in this case) and we replace all those values with np.nan
. See here. I was able to resolve the error by converting the image_data
array to float
right before we create the no_data_mask
.
There will be more aspects of the viz-workflow to tweak before this dataset is complete, such as the palette. Hopefully, this is one step closer to adjusting the workflow to enable styling the 3D tiles and web tiles based on an attribute of interest. See viz-workflow issue#9).
Using cesium to preview the web tiles, we see that the polygons that span certain Arctic latitudes are distorted to create rings around the North pole:
Perhaps the projection of the default TMS for the viz workflow distorts the geometries that cross the Arctic circle (66 degrees) as well as the geometries that cross the latitude of the thinner, more northern ring.
These rings are not present in the input data to the visualization workflow. When we plot both the raw data and the cleaned input data, with the palette applied to the "EXTENT" attribute, we see a map without distortion. Here's the cleaned data with a viridis palette:
These rings would certainly make the output data confusing for users on the PDG portal.
After re-projecting the CRS of the cleaned geodataframe to the CRS of the TMS, the data looks like:
Because we need to use this TMS, a possible solution is to edit the input data of the viz workflow by converting the CRS of the geopackage beforehand, then masking it to only retain polygons that fall over land, then feeding that geodataframe into the viz workflow.
The invalid geometries that were displayed like rings around the Arctic Circle as shown above were resulting from the conversion to the CRS of the TMS during staging only from the geometries that intersected the antimeridian. There were 6 geometries that intersected, and removing them before starting the workflow resulted in the following tiles:
The following script was used to process the data before executing the workflow.
The final details to work out are:
Split the 6 polygons at the antimeridian rather than removing them
antimeridan
packageassign a palette (by adjusting the config and re-executing just web-tiling) to properly represent all the values of the attribute of interest (the extent dummy coding)
using both "aggregation_method": "max"
(shown in the screenshot), and "aggregation_method": "mean"
show mostly web tiles of one color, and with slight variation:
Using palette:
"palette": [
"#e41ba866",
"#af18cd66"
]
Note: 66 = 40% alpha and looks good, considering we want users to overlay IWP on this layer and still see the terrain
There should be 4 values that are represented:
The palette now displays all 4 colors. The data package on the ADC will have the one input file from Brown et al. 2022, the geopackages, and the rasters. The package must be published with a new DOI (A2MG7FX35
) before we can move the layer onto production so that users can access the data/metadata. Per Matt's suggestion, the ADC metadata will include provenance triples indicating the derivation relationship with the National Snow and Ice Data Center.
Web tiles need to be re-processed with a blue palette, per Anna's request, because this dataset is usually visualized in blue by others.
Layer with the default 40% opacity and IWP overlaid:
When this layer was first uploaded to demo, the incorrect legend order revealed a bug, described in this issue. A fix in the XML was made so the value of each legend item determines the order (1,2,3,4), rather than the string label.
The data for this layer are archived at the ADC at /var/data/10.18739/A2MG7FX35
:
permaice_clean.gpkg
that is fed into the viz workflowThe web tiles are in: var/data/tiles/0.18739/A2MG7FX35/
To do:
/var/data/10.18739/A2MG7FX35
in abstract so users know where to find the dataI created a script that:
Problem: After converting these spit polygons to WGS84, they are still deformed in the same way (wrap around the other side of the world). A potential reason for this could be that the way I split the polygons results in their split side touching or sharing the antimeridian, so the polygons are still not able to be transformed to WGS84 correctly. This may be resolved by:
Defining the antimeridian in the CRS of the input data is the most complicated part of this process. In WGS84, defining a line at the antimeridian is easy. However, the CRS of the input data is in units of meters. The CRS info is the following:
I tried converting the antimeridian line in WGS84 to the Lambert Azimuthal projection, but the values that are not 0 become infinite
. The closest solution I found so far, after trail and error of changing the coordinates and mapping, was to manually define the LineString
with 2 coordinate values: line_geometry = LineString([(0, -20000000), (0, 20000000)])
. The +/- 20000000 values are derived from mapping the polygons in the Lambert Azimuthal projection and checking the values on the x-axis (this line looks like the antimeridian when plotted, and this line looks like it splits the polygons where I would visually want them split, based on where the antimeridian crosses them).
The first part of the script reads in the input data, removes NA values from the attribute we are visualizing, and identifies the polygons that intersect the antimeridan.
The polygons visualized at this stage:
The next steps in the cleaning script create a line that represents the antimeridian and splits the polygons there.
The polygons visualized at this stage:
Then we can test if this splitting worked for our intended purposes by converting the polygons at this stage to WGS84 and plotting the result in the same way.
They still wrap around the world the opposite way when converted to WGS84.
@julietcohen, just exploring the CRS from this data...
perm.crs.name
# 'Sphere_ARC_INFO_Lambert_Azimuthal_Equal_Area'
perm.crs.to_epsg()
# None
This seems to be a projection without a EPSG code. If you try setting it to EPSG:9820 (perm.set_crs(9820, allow_override=True)
), you get an error from the Proj library: CRSError: Invalid projection: EPSG:9820: (Internal Proj Error: proj_create: crs not found)
. I'm not sure 9820 is the identifier for this projection anyways. In their paper, the authors say they used the the Lambert Azimuthal Equal Area Polar Projection. The only information that I could find out about this projection really is that it is identified as SR-ORG:80
, and someone had the exact issue you did with it in geopandas (see: geopandas issue: "to_crs error for polygons crossing the -180 degree meridian"). They indicated that reprojecting in ArcMap works.
I'm not sure, but I have a feeling that some of the projection information needed is not handled properly by geopandas (or more accurately: by the PROJ library that geopandas uses under the hood). I wonder if it's possible to re-project in ARC first.
Also, I don't know if this might help in splitting the polygons, but there is some prime meridian info in the CRS object:
perm.crs.prime_meridian
# PRIMEM["Greenwich",0, ANGLEUNIT["Degree",0.0174532925199433]]
Thanks for the insight, @robyngit! That helps clarify the confusing outputs and info (and lack of info) that I found online about this projection as well. I think you're right that there's information that geopandas can't handle, because an error is also returned from reprojecting an antimeridian line in WGS84 to the CRS of the input data.
I will look into your suggestion to reproject this shapefile in ARCmap.
Converting the original data to EPSG:4326 in QGIS did not work because the uploaded file does not have a known EPSG. The Lambert projection is labeled as a custom CRS. QGIS is different than ArcMap (although the specific differences are unknown to me), so there is a chance that still may work in ArcMap, but I do not have access to ArcMap at this moment (maybe I can through a coworker at NCEAS).
Instead of doing that CRS conversion for the data file with that software, I continued to try to split the polygons with a buffer at the antimeridian.
In order to split the polygons the cross the antimeridian, I tried to use geopandas
and shapely
. As noted earlier in this issue, simply splitting the polygons at a defined LineString
for the antimeridian (in meters) did work in creating valid geometries that lied on either side of the meridian. However, when the geometries were then transformed into EPSG:4326 they were still deformed and wrapped the opposite way around the world likely because they were not buffered from the antimeridian, so the new split side of the polygon likely still intersected the antimeridian. It could be that my definition of the linestring for the antimeridian in meters was slightly off. I suggested that this may be able to be resolved by splitting the polygons with a buffered antimeridian linestring.
Unfortunately, an error resulted: GeometryTypeError: Splitting a Polygon with a Polygon is not supported
I thought maybe QGIS can do this, but first wanted to try another programatic approach.
Next I tried to split the polygons with the original (not buffered) antimeridian LineString
, then buffer the polygons's split side after they are split. But I could not figure out how to buffer one side of a polygon. I experimented with geopandas.buffer()
:
for split_geom in split_polys.geoms:
if split_geom.intersects(line.geometry.iloc[0]):
buffered_split = split_geom.buffer(buffer_distance, single_sided = True)
geoms_all.append(buffered_split)
The output does not look buffered when mapped, even when the buffer distance is huge.
While I have not 100% given up hope that geopandas.buffer()
may be a programmatic solution, I tested if QGIS can split polygons with a buffered LineString
, and it can! I uploaded 2 spatial files, the polygons that cross the antimeridian and the antimeridian in units of meters, and buffered the antimeridain 10,000 m. It can then split the polygons where the buffered line (polygon) is, and export the split polygons as a spatial file in the same "custom" Lambert projection.
Converting these geoms to EPSG:4326 shows no wrapping around the world:
I am not sure how to define an object or a LineString
as PRIMEM["Greenwich",0, ANGLEUNIT["Degree",0.0174532925199433]]
, which Robyn found for the Lambert projection metadata. I tried a few approaches including line_geometry = polys.crs.prime_meridian
but errors are returned like TypeError: Input must be valid geometry objects: Greenwich
I have come up with a new cleaning script that does the following steps programmatically:
In order to integrate this approach as a generalized step in viz-staging
before staging any input data that contains geometries that cross the antimeridian, I'll need to find a way to generalize the way the antimeridian is defined. In this script, I define it as a LineString
by explicitly defining the 2 coordinates. The LineString
should ideally be defined by fetching the CRS of the input data and pulling the antimeridian from that. The following step will also buffer the LineString
based on the minimum possible value for the distance
argument of buffer()
. I have not achieved this generalized approach yet, so this script will likely serve as the cleaning (pre-vizualization) for this dataset, and then for others we can integrate this functionality into viz-staging
.
I visualized the dataset that was cleaned with the antimeridian polygons split as described above and uploaded it to the demo portal. I think the split polygons look great. The data was processed with max z-level set to 10. Processing the dataset for the final time will be with a higher z-level. The screenshot below is visualized with 65% opacity.
Polygons with one of the 4 categories of the EXTENT
value (not NA) are present below 45 degrees latitude in the original data from Brown et al. When I visualize the data in the viz workflow, the output tilesets seem to be cut off at 45 degrees south. I wonder why this is.
Here is the Brown et al. data visualized in QGIS, with a palette that shows dark blue as continuous extent (C), lighter blues for less continuous extents (D and I and S), and white for other (NA value).
In the PDG demo portal, the polygons are cut off at 50 degrees latitude when zoomed out, and 45 degrees latitude when zoomed in, where the lower polygons appear only when you zoom in. No matter the zoom level, a clear latitudinal line serves as a cut off for the polygons.
Zooming in, we also see a few undesired vertical/longitudinal "stripes" that extend from 45 degrees more south, originating from the southernmost part of the polygons. These stripes very closely resemble the bands we saw stretch around the world horizontally/latitudinally when the polygons that crossed the antimeridian and became distorted when converted to WGS84. Note that these vertical stripes only appear when very zoomed in, they are not present when zoomed out. Northern Washington state is pictured here.
There is potential for the limiting factor to be in the bounding box created in viz-staging here. Perhaps the bottom of the bounding box is hard coded to a certain latitude somehow.
In viz-staging, we use geopandas total_bounds
here to determine the bounding box of the TMS. In a notebook, I imported the input data for the viz workflow (the gpkg output from the cleaning script) and converted it to EPSG 4326, and ran gdf.total_bounds
. The output: array([-179.99997917, 27.42921194, 179.99997917, 83.6228101 ])
, which has a miny of ~27 so it should include polygons that extend south of 45 degrees latitude. I think a good next step would be to start the viz workflow again, and check these variables grid
and bounds
to determine that the bbox is what we want
I subset the cleaned data (output of the cleaning script) to just a handful of polygons (those that are ~30 degrees latitude) to prove that the viz workflow can process polygons that far south. Here they are in local cesium:
It was requested that we display a permafrost layer as the base layer when someone visits the PDG for the first time. The permafrost layer could be combined with the ice-wedge polygon map, so two layers presented as base layers.
Feedback collected by Moss & Andrew (the K12 teacher in Bethel) indicated that anyone coming to the PDG does not see/understand (right away) that the PDG focus is on permafrost.
The ideal permafrost layer would be the Circum-Arctic Map of Permafrost and Ground-Ice Conditions, Version 2.
According to the metadata, it looks to be a 24 MB shapefile.