Closed robyngit closed 6 months ago
I did a test run of this layer. Here are some notes on processing it:
SACHI.shp
fileI used the following config options, but we will probably want to create tiles higher resolution than z 10 (perhaps 13):
{
"z_range": [0, 10],
"statistics": [
{
"name": "coverage",
"weight_by": "area",
"property": "area_per_pixel_area",
"aggregation_method": "sum",
"resampling_method": "average",
"val_range": [
0,
1
],
"nodata_val": 0,
"palette": "oryel",
"nodata_color": "#ffffff00"
}
]
}
Preview in Cesium:
Based on feedback from Annett (below), we should do the following with the next run:
We should also remember to re-create the layer when updated data is available next year.
Details:
If displaying as raster, I would suggest to do no resampling which does averaging as it is discrete information (yes or no/classes). Otherwise you are introducing new information (on size, bright if narrow, dark if it is a larger polygon), what could be misunderstood as some thematic content. I would suggest 'nearest neighbour' instead of 'bilinear' . In that case you could also differentiate types (road, building, other).
Note that we are just about to complete an updated version. It covers a slightly larger area and has additional classes (three road types, airstrips and reservoirs as extra/additional objects). But I assume that quality control will not be finished before the end of the year.
Annett has produced a new version of this dataset, now located on Datateam at: /var/data/submission/pdg/bartsch_infrastructure/SACHI_v2/
Old version of the data is now in /var/data/submission/pdg/bartsch_infrastructure/old_version/
Initial notes:
viz-raster
is set to nearest neighbor by default but is overwritten by whatever is specified in the config. See from_rasters()
SACHI_v2.shp
has only geometries and a DN
column, which as unique values: 11, 12, 13, 20, 30, 40, 50
The DN
col has no NA values but the geometry column has 94,924 NA (none
) values. That's only ~3.4% of all rows.
CRS of input data:
all geometries:
Data after converting to EPSG 4326:
I was able to process a first draft of outputs with the visualization workflow, but importantly in order to produce the web tiles I needed to use the same modification to to_image()
in this pull request that I originally used to successfully produce web tiles for the permafrost and ground ice layer (see this comment in that issue). While processing both datasets, they gave the same error and failed to write any web tiles until I added a line in to_image()
that converts the image data to float. This gives me more confidence that this PR should be merged. A similarity of these 2 datasets is they both visualize categorical variables. In the permafrost dataset, we are visualizing 4 categories of permafrost coverage, and in this infrastructure dataset we are visualizing 7 types of infrastructure.
On Zenodo, I was granted access to the older version of the dataset which includes a README that includes attribute descriptions, which I was hoping would explain the infrastructure codes we see in this newer version of the dataset. Unfortunately, the attribute DN
is not described in the older README.
First draft of the data layer with low resolution (set to max z10):
DN
attributeto_image()
PR into viz-raster develop then main branches and release before can link to version of the software that produced this datasetRuntimeWarning: divide by zero encountered in scalar divide (255 / (max_val - min_val))
val_range
to the config results in more consistency in the pallette for each infrastructure unit within the web tiles
val_range
includes the smallest possible infrastructure code (11) to the largest (50)Category: Infrastructure
Annett has provided the README for this dataset. It provides a code for each infrastructure type and is located in /var/data/submission/pdg/bartsch_infrastructure/
The dataset package on the ADC: https://arcticdata.io/catalog/view/urn%3Auuid%3A4e1ea0af-6f7c-4a7a-b69f-4e818e113c43
10.18739/A21J97929
viz-raster
that was used to process this dataset so we can link to it in the metadataviz-staging
(v0.9.1) was used for this dataset so no need to make a new release for this data packageThe final dataset has been processed and is archived on datateam:
/var/data/10.18739/A21J97929/
input
contains:SACHI_v2
data file from Annett (just 1 shapefile and it's extension files)SACHI_v2_clean.gpkg
that was actually used as input to the viz workflow/var/data/tiles/10.18739/A21J97929/
viz-workflow
v0.9.2 has been released (that was used to produce this data layer). The Only thing blocking this layer from moving to production is the ADC data package needs to be finished and published. I have been adding more metadata to this package the past week, so it is on it's way to getting assigned it's pre-issued DOI!
Justin from the ADC has helpfully taken over the rest of the metadata documentation for this dataset, starting this week.
Annett has requested that I change the palette of this dataset so that buildings are in red and water is in blue. I also noticed that there were MultiPolygons (only 2% of the geometries) in the input data from Annett. I re-cleaned this input data (still removed NA geoms like last time, with the additional cleaning step of exploding those MultiPolygon geoms) and will re-process the data with the viz workflow the requested palette change as well. This is not a time consuming operation on Delta.
This new cleaning script will replace the old one that is uploaded to the data package and the staged and geotiff tilesets will replace the ones currently in the pre-assigned DOI dir. I already discussed this with Justin.
For convenience, I am pasting the values for infrastructure code here (they are from the readme):
Annett also mentioned that she is going to update the dataset on Zenodo with the new version this week, and it already has a DOI there. Justin let her know we will still use our pre-assigned DOI since it is a new dataset, and I went into detail about our derived products. We will mention the Zenodo DOI in the dataset documentation on the ADC.
Delta decided that python doesn't exist anymore in any of my environments so I have been troubleshooting that for the past hour. Next step would be to uninstall VScode and reinstall, and remove the known hosts for Delta. I already tried uninstalling all python extensions and reinstalling.
Annett has updated the dataset based on my feedback regarding the geometries. She uploaded the new version of SACHI_v2 to Zenodo. She included the following info:
DOI 10.5281/zenodo.10160636 I made a number of changes based on your feedback: 1) I had a closer look at the geometry properties. The version which you got had duplicates and the overlap areas of the different sentinel-2 source granules were not yet merged. This is now solved and the there are now also no features without attributes any more 2) I extended the readme file for the meta data: S2_date1 to S2_date3 - dates of individual Sentinel-2 images used for averaging S1_winter - year(s) of Sentinel-1 images used for averaging (months December and/or January)
Justin and I made it clear that we would be giving the dataset a new DOI (the one I pre-created for this dataset) because our ADC version of the data package differs from the one she has on Zenodo, considering all our derived products.
Since she made changes to SACHI_v2, I will upload the new version to Datateam and reprocess the dataset with the viz workflow.
I reprocessed the infrastructure dataset with Annett's new version of SACHI_v2 data as input to the viz workflow. Since she fixed the geometries, I only had to split the multipolygons before inputting the data into the viz workflow. I also updated the palette to represent buildings in red and water in blue as Annett requested and removed yellow from the palette since we visualize the ice-wedge polygon layer in yellow, and we anticipate that users will explore these layers together. All input and output and the new pre-viz claning script have been uploaded to /var/data/10.18739/A21J97929
. The output web tiles have been uploaded to /var/data/tiles/10.18739/A21J97929/
I updated the demo portal with this layer:
I was able to refine the assignment of the categorical palette so there is just 1 color attributed to 1 infrastructure type. I did this in a similar way that I did for the permafrost and ground ice dataset. Instead of using the attribute DN
for the visualized tiles, I needed to create a new attribute which I named palette_code
that assigns numbers 1 through 7 to each of the DN
numbers because the numbers for the palette assignment seem to need to be evenly spaced in order for 1 color to correspond to 1 categorical value. The values of DN
are numerical but are not evenly spaced.
# add attribute that codes the categorical DN attribute
# into evenly spaced numbers in order to assign
# palette correctly to the categories in the web tiles:
conditions = [
(data['DN'] == 11),
(data['DN'] == 12),
(data['DN'] == 13),
(data['DN'] == 20),
(data['DN'] == 30),
(data['DN'] == 40),
(data['DN'] == 50)
]
choices = [1, 2, 3, 4, 5, 6, 7 ]
data['palette_code'] = np.select(conditions, choices)
As a result, I see more diversity in the colors of the geometries, as Annett suggested they should look, and I integrated grey for the 30 value of DN
per her suggestion:
Taking this approach, I realize that in order for Annett's metadata for the DN
values to match the numbers in the raster, I need to include both attributes (DN
and palette_code
) as bands in the rasters. This is because we will use the web tiles for palette_code
as the layer we visualize on the portal, and users will want the option to use either band when they download the rasters for their own analysis.
One note is that I have only tried this dataset with a custom palette with each of the 7 hex codes specifically assigned rather than a pre-made palette with a name, because we wanted to include certiain colors in a specific order, and exclude certain colors that would too closely match other datasets on the PDG.
One more consideration for this dataset: the resolution of this data is not clear on the zenodo page (this new zenodo link is important because it points to version 2 of the dataset that Annett updated, not version 1 which was originally linked above when this ticket was first created). Sentinel data resolution can vary depending on the bands used. One way to find this may be to dive deeper into the methods, like a paper associated with this dataset, or ask Annett. I have been using z-level 12 as the max, cause that is ~32m resolution at latitude +/- 31
Annett has approved the layer to be moved to production. The remaining to-do items, in order:
infrastructure_code
(DN
) and palette_code
cn014
) from the staged dir (just moving everything within cn014
up one level then removing that dir)viz-workflow
v0.9.3 releaseviz-workflow
v0.9.3 with new config used for infrastructure layer processing, update that release version in the ADC package;amp;
things that appear in the text when special characters are used@julietcohen In reading through your correspondence on this ticket, I saw that it originated as a vector dataset. The most accurate way for us to present it would be as a vector layer, rather than raster. Let's please discuss this as I suspect it would provide a much better result. We have a number of vector layers in the portals now, mainly via a geoJSON conversion. I'm not sure if the dataset size would be prohibitive there, but let's discuss please.
datateam.nceas.ucsb.edu:/home/pdg/data/bartsch_infrastructure
The data is associated with the following two papers:
The data are archived with Restricted Access on Zenodo