Open robyngit opened 1 year ago
We have discussed setting up a structured data access service on the backend of the ADC as an initial step towards integrating the plot viewer into portals. We outlined the structured data access service as the following in the Google.org proposal for the Discovery & analysis tools user interface working group:
Structured Data Access Service: Currently, model output data are in geospatial data formats, including GeoTIFF and GeoPackage, but are not optimized for delivery to analytical clients that want to access specific parameters over different spatial and temporal areas of interest. This task will create a data access service to restructure these datasets (e.g., using HDF5 or Zarr) to enable subsetting to arbitrary spatio-temporal windows, and aggregation across different resolutions, for delivery to the visualization plot viewer. (Team: Backend Engineer Fellow and Robyn, with input from Matt, Juliet) Deliverable 1: Documentation and requirements for a series of five use cases for data access for specific plot visualizations that should be supported by a data access service Integrating a Content-delivery Network into the structured data access service would be useful to speed up delivery of web-based geospatial data for visualization tools and scientific access. This may involve hosting data products on Google-supported cloud storage that has fast external access and Content-delivery Network functionality. Deliverable 2: using a sample of several PDG datasets, design a data storage structure that allows database querying to produce the data needed for those five use cases, specifying the inputs and outputs for each Deliverable 3: Prototype the Structured Data Access Service service against entire PDG datasets
Recently, Doug from Google.org requested a sample of several different tilesets that overlap the same small region on the PDG. He plans to use these samples for testing new palette features, but importantly this incentivized me to find the best way to do this before the bounding box and plot viewer tools are built. Essentially, my task is to find a region on the portal where several tilesets overlap, then retrieve the same handful of tiles (same z, x, and y filepaths) from each of the tilesets. Doug will likely find the rasters more useful than the vectors since he is working with palettes. For the region, I chose the north slope of AK around Utqiagvik where we have overlapping layers for infrastructure, permafrost extent, ice-wedge polygons, a local news story, and others. I tried 2 approaches to retrieve the filepaths of all the tiles that fall within the region:
1) Devtools and wget
command
Use devtools while zoomed into the region of interest on the map, with the layers of interest toggled on. Navigating to the Network tab and moving aorund the portal view a little refreshes the list of tiles that are retrieved from the backend. Clicking on one of the tiles on the left pane opens up details, where you can preview the tile and the full path to the tileset from which it is retrieved:
Instead of navigating to the individual tile URLs and downloading 1 by 1, Ian suggested to copy and paste the filepaths into a text document and use wget -i tileset.txt
to download them all.
morecantile
and mercantile
and wget
command
Use these libraries in conjunction to initially find the filepaths that fall within a bbox around the ROI, then paste those filepaths into a text doc and use the wget
command. I initially picked a lat and long pair that falls in the ROI, then input that into:
mercantile.tile(lng, lat, zoom)
to retrieve the tile that contains that coordinate pair, then insert that output tile into:
tms = morecantile.tms.get("WGS1984Quad")
tms.bounds(morecantile.Tile(x, y, z ))
to get the bounds of that tile, then insert the output bounds and a z-level (9) into:
tiles = mercantile.tiles(-156.822070,71.266541,-156.533335,71.344341, 9)
tiles_list = list(tiles)
And the output shows several tilepaths like:
[Tile(x=32, y=108, z=9),
Tile(x=32, y=109, z=9),
Tile(x=33, y=108, z=9),
Tile(x=33, y=109, z=9)]
which could be saved to a text file and then input into the wget
command. But retrieving the bounds of the tile that contains the coord pair was zooming to a different region in cesium, so I instead had more accuracy manually drawing a bbox on bbox finder and copying the right bounds from there. But there was still a problem: the output tiles from mercantile.tiles
did not exist in the tilesets that cover this region when I looked within the infrastructure layer dirs, IWP layer dirs, etc. My first thought for the explanation is there were no polygons that overlapped the specific tiles that the mercantile package pulled as the tiles that fell within the bbox. But technically the package should have pulled ALL tiles that fell within the bbox for that z-level, so I should have gotten a mix of tiles that did exist in our tilesets and tiles that did not exist cause there was no polygons overlap for some. So something else is off.
Another note on wget
commands:
Justin suggested that the best command to run to download all tile from a specified subdir is:
wget -r -np -nH --cut-dirs=3 -R '\?C=' -R robots.txt https://arcticdata.io/data/10.18739/{DOI}/
with whatever subdir you want tacked onto the end.
Hopefully this is helpful moving forward as we design the bbox drawing tool and plot viewer.
To follow up on my quest to query several PDG tilesets for both GeoTIFF and PNG files for a small region, I used an approach similar to the first approach outlined in the previous comment: a combination of devtools (to identify which z, x, and y tiles are of interest) and a python script that downloads the files by iteratively executing wget
commands. I uploaded these dataset samples to a Google Drive for Doug. The python scripts are below.
This issue has two parts:
Ongoing dialogue on the Plot Viewer: A Summary
We have discussed various aspects of the plot viewer over a long period of time on the PDG team. I've attempted to compile points from the discussions below, but some may be missing. ⭐➡️ Feedback is welcome! ⬅️⭐
Existing mockups:
Examples highlighted by PDG team:
Importance of plot viewer:
Handling large datasets:
Potential datasets & variables for MVP:
Discussion around plotting the Lake Change layer:
Features identified as important in the PDG mini-workshop survey (Apr 2023):
Other topics discussed that we may need to consider: