bmcgaughey1 / USGSlidar

A collection of functions to browse lidar data collections, query collections for coverage for specific locations, and retrieve data covering locations.
Creative Commons Zero v1.0 Universal
5 stars 1 forks source link

Problems with USGS WESM and TESM tile index files #8

Closed bmcgaughey1 closed 1 year ago

bmcgaughey1 commented 2 years ago

As of May 2023, the issues with the index files seem to be resolved. I have not done extensive testing, but things seem to work OK.

The USGS TESM tile index has problems with some lidar projects. The problems are related to missing or bad geometries for some tiles and involves ~20 lidar projects (haven't really checked non-lidar projects). The queryUSGSTileIndex() function works correctly when the project being queried has good tiles but it may fail when there are missing or bad tile geometries within the project. USGS is aware of these problems but has been slow to correct them.

In addition, the TESM tile index does not include sufficient information to fully locate the tile (LAS/LAZ file) on the rockyweb server. The tile_id field in the TESM index is just a tile identifier and does not (usually) fully identify the file on the server. The logic used to create the actual file names seems to vary depending on the lidar project. For recent projects, the actual file names appear to be constructed using the project name and the year the point data were published. Older projects do not include information for the publication year in the file names.

While the WESM index contains a lpc_link field that is presumably the URL for the point files associated with a project, the URL is incomplete and does not include the actual folder containing the point files. Usually, the point files are in a folder named "laz" or "LAZ" but some projects have point files in a folder named "LAS". The lpc_link URLs are often missing the trailing "/" so code that uses them needs to check for the trailing "/" and add it if missing.

Example The data for Glacier Peak in Washington state is identified in WESM by:

Data for this area were collected in 2014-2015 and published in 2016. The point data files area actually located in "https://rockyweb.usgs.gov/vdelivery/Datasets/Staged/Elevation/LPC/Projects/USGS_LPC_WA_GlacierPeak_2014_LAS_2016/laz/" and individual files names look like this: "USGS_LPC_WA_GlacierPeak_2014_10TFU1514_LAS_2016.laz "

The record in the TESM tile index for the same tile contains the following:

For this project and tile, we can construct a URL as follows (R syntax using lubridate package for year() function): URL <- paste0(WESM$lpc_link, "laz/", "USGSLPC", WESM$workunit, "_", TESM$tile_id, "LAS", year(WESM$lpc_pub_date), ".laz")

For projects where the lpc_pub_date is missing of set to NA, the URL may be as follows (but not tested for all projects): URL <- paste0(WESM$lpc_link, "laz/", "USGSLPC", WESM$workunit, "_", TESM$tile_id, ".laz")

bmcgaughey1 commented 1 year ago

USGS seems to have cleaned up problems with the TESM index as of mid-2023 (probably earlier). I haven't done any testing recently but have used the index to query for data for several areas without problems. Previously, things would fail if an area was covered by any lidar projects where the tile information was bad. Not seeing this happen now.

bmcgaughey1 commented 1 year ago

Closing this for now. May reopen after more testing of if I run into problems with the index files.