NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
41 stars 26 forks source link

Design the MVP Plot Viewer for the PDG portal #2182

Open robyngit opened 1 year ago

robyngit commented 1 year ago

This issue has two parts:

  1. [ ] Determine the essential features, types of plots, and data formats the plot viewer will support in its initial release.
  2. [ ] Create mockups of the MVP version of the Plot Viewer, including how the view will be integrated into the PDG portal

Ongoing dialogue on the Plot Viewer: A Summary

We have discussed various aspects of the plot viewer over a long period of time on the PDG team. I've attempted to compile points from the discussions below, but some may be missing. ⭐➡️ Feedback is welcome! ⬅️⭐

Existing mockups:

Examples highlighted by PDG team:

Importance of plot viewer:

Handling large datasets:

Potential datasets & variables for MVP:

Discussion around plotting the Lake Change layer:

Features identified as important in the PDG mini-workshop survey (Apr 2023):

Other topics discussed that we may need to consider:

julietcohen commented 6 months ago

We have discussed setting up a structured data access service on the backend of the ADC as an initial step towards integrating the plot viewer into portals. We outlined the structured data access service as the following in the Google.org proposal for the Discovery & analysis tools user interface working group:

Structured Data Access Service: Currently, model output data are in geospatial data formats, including GeoTIFF and GeoPackage, but are not optimized for delivery to analytical clients that want to access specific parameters over different spatial and temporal areas of interest. This task will create a data access service to restructure these datasets (e.g., using HDF5 or Zarr) to enable subsetting to arbitrary spatio-temporal windows, and aggregation across different resolutions, for delivery to the visualization plot viewer. (Team: Backend Engineer Fellow and Robyn, with input from Matt, Juliet) Deliverable 1: Documentation and requirements for a series of five use cases for data access for specific plot visualizations that should be supported by a data access service Integrating a Content-delivery Network into the structured data access service would be useful to speed up delivery of web-based geospatial data for visualization tools and scientific access. This may involve hosting data products on Google-supported cloud storage that has fast external access and Content-delivery Network functionality. Deliverable 2: using a sample of several PDG datasets, design a data storage structure that allows database querying to produce the data needed for those five use cases, specifying the inputs and outputs for each Deliverable 3: Prototype the Structured Data Access Service service against entire PDG datasets

julietcohen commented 6 months ago

Recently, Doug from Google.org requested a sample of several different tilesets that overlap the same small region on the PDG. He plans to use these samples for testing new palette features, but importantly this incentivized me to find the best way to do this before the bounding box and plot viewer tools are built. Essentially, my task is to find a region on the portal where several tilesets overlap, then retrieve the same handful of tiles (same z, x, and y filepaths) from each of the tilesets. Doug will likely find the rasters more useful than the vectors since he is working with palettes. For the region, I chose the north slope of AK around Utqiagvik where we have overlapping layers for infrastructure, permafrost extent, ice-wedge polygons, a local news story, and others. I tried 2 approaches to retrieve the filepaths of all the tiles that fall within the region:

1) Devtools and wget command Use devtools while zoomed into the region of interest on the map, with the layers of interest toggled on. Navigating to the Network tab and moving aorund the portal view a little refreshes the list of tiles that are retrieved from the backend. Clicking on one of the tiles on the left pane opens up details, where you can preview the tile and the full path to the tileset from which it is retrieved:

portal

Instead of navigating to the individual tile URLs and downloading 1 by 1, Ian suggested to copy and paste the filepaths into a text document and use wget -i tileset.txt to download them all.

  1. morecantile and mercantile and wget command Use these libraries in conjunction to initially find the filepaths that fall within a bbox around the ROI, then paste those filepaths into a text doc and use the wget command. I initially picked a lat and long pair that falls in the ROI, then input that into:
    mercantile.tile(lng, lat, zoom)

    to retrieve the tile that contains that coordinate pair, then insert that output tile into:

    tms = morecantile.tms.get("WGS1984Quad")
    tms.bounds(morecantile.Tile(x, y, z ))

    to get the bounds of that tile, then insert the output bounds and a z-level (9) into:

    tiles = mercantile.tiles(-156.822070,71.266541,-156.533335,71.344341, 9)
    tiles_list = list(tiles)

    And the output shows several tilepaths like:

    [Tile(x=32, y=108, z=9),
    Tile(x=32, y=109, z=9),
    Tile(x=33, y=108, z=9),
    Tile(x=33, y=109, z=9)]

    which could be saved to a text file and then input into the wget command. But retrieving the bounds of the tile that contains the coord pair was zooming to a different region in cesium, so I instead had more accuracy manually drawing a bbox on bbox finder and copying the right bounds from there. But there was still a problem: the output tiles from mercantile.tiles did not exist in the tilesets that cover this region when I looked within the infrastructure layer dirs, IWP layer dirs, etc. My first thought for the explanation is there were no polygons that overlapped the specific tiles that the mercantile package pulled as the tiles that fell within the bbox. But technically the package should have pulled ALL tiles that fell within the bbox for that z-level, so I should have gotten a mix of tiles that did exist in our tilesets and tiles that did not exist cause there was no polygons overlap for some. So something else is off.

Another note on wget commands: Justin suggested that the best command to run to download all tile from a specified subdir is:

wget -r -np -nH --cut-dirs=3 -R '\?C=' -R robots.txt https://arcticdata.io/data/10.18739/{DOI}/

with whatever subdir you want tacked onto the end.

Hopefully this is helpful moving forward as we design the bbox drawing tool and plot viewer.

julietcohen commented 6 months ago

To follow up on my quest to query several PDG tilesets for both GeoTIFF and PNG files for a small region, I used an approach similar to the first approach outlined in the previous comment: a combination of devtools (to identify which z, x, and y tiles are of interest) and a python script that downloads the files by iteratively executing wget commands. I uploaded these dataset samples to a Google Drive for Doug. The python scripts are below.

download_geotiff_tiles.py ```python # Download a subset of a tilesets (GeoTIFFs) for several PDG layers # Steps: # 1. on the PDG portal, toggle on a layer of interest and zoom in # 2. open devtools & the Network tab to display tiles in view # (may need to pan around viwer to refresh list) # 3. manually copy the end of the filepaths, just the z & x dirs # (see the Header subtab) for ONLY the tiles from the data layer # (not the base layer tiles), and paste all in this script below # 4. prepend the z and x dirs with the URL to the GeoTIFF dir with {DOI} # 5. make list of DOI's for all desired GeoTIFF tilesets in this script # 6. execute script import subprocess from subprocess import Popen # list DOIs for PDG layers: # IWP, # Lake Size Time Series (bands for both Seasonal Water and Permanent water), # Infrastructure DOIs = ["A2KW57K57/iwp_geotiff_high", "A28G8FK10/yr2021/geotiff", "A21J97929/geotiff"] for DOI in DOIs: print(f"Downloading files for DOI: {DOI}") # list tile URLs for z and x dirs (copied from devtools) URLs = [f"https://arcticdata.io/data/10.18739/{DOI}/WGS1984Quad/12/528/", f"https://arcticdata.io/data/10.18739/{DOI}/WGS1984Quad/12/527/", f"https://arcticdata.io/data/10.18739/{DOI}/WGS1984Quad/10/131/", f"https://arcticdata.io/data/10.18739/{DOI}/WGS1984Quad/12/529/", f"https://arcticdata.io/data/10.18739/{DOI}/WGS1984Quad/11/263/"] for URL in URLs: print(f"Downloading files for URL: {URL}") # command to download all tiles in URL dir to current working dir cmd = ["wget", "-r", "-np", "-nH", "-A", "*.tif", "--cut-dirs=2", "-R", "wget-log.*", URL] process = Popen(cmd) process.wait() print("Script complete.") ```
download_png_tiles.py ```python # Download a subset of a tilesets (PNGs) for several PDG layers # Steps: # 1. on the PDG portal, toggle on a layer of interest and zoom in # 2. open devtools & the Network tab to display tiles in view # (may need to pan around viwer to refresh list) # 3. manually copy the end of the filepaths, just the z & x dirs # (see the Header subtab) for ONLY the tiles from the data layer # (not the base layer tiles), and paste all in this script below # 4. prepend the z and x dirs with the URL to the PNG dir with {DOI} # 5. make list of DOI's for all desired PNG tilesets in this script # 6. execute script import subprocess from subprocess import Popen # list DOIs for PDG layers: # IWP, # Lake Size Time Series (both Seasonal Water and Permanent water), # Infrastructure DOIs = ["A2KW57K57", "A28G8FK10/yr2021/web_tiles/seasonal_water", "A28G8FK10/yr2021/web_tiles/permanent_water", "A21J97929/SACHI_v2/web_tiles/infrastructure_code"] for DOI in DOIs: print(f"Downloading files for DOI: {DOI}") # list tile URLs for z and x dirs (copied from devtools) URLs = [f"https://arcticdata.io/data/tiles/10.18739/{DOI}/WGS1984Quad/12/528/", f"https://arcticdata.io/data/tiles/10.18739/{DOI}/WGS1984Quad/12/527/", f"https://arcticdata.io/data/tiles/10.18739/{DOI}/WGS1984Quad/10/131/", f"https://arcticdata.io/data/tiles/10.18739/{DOI}/WGS1984Quad/12/529/", f"https://arcticdata.io/data/tiles/10.18739/{DOI}/WGS1984Quad/11/263/"] for URL in URLs: print(f"Downloading files for URL: {URL}") # command to download all tiles in URL dir to current working dir cmd = ["wget", "-r", "-np", "-nH", "-A", "*.png", "--cut-dirs=3", "-R", "wget-log.*", URL] process = Popen(cmd) process.wait() print("Script complete.") ```