NASA-PDS / validate

Validates PDS4 product labels, data and PDS3 Volumes
https://nasa-pds.github.io/validate/
Apache License 2.0
16 stars 11 forks source link

Add new functionality to validate TIFF products used as observational products are uncompressed TIFF #196

Closed jordanpadams closed 3 years ago

jordanpadams commented 4 years ago

Once GeoTIFF is introduced officially into the standard, we need to validate that the TIFF being referenced is an actual uncompressed TIFF. GDAL drivers or some other methodology may be possible here to verify this.

jordanpadams commented 3 years ago

@thareUSGS do you have any test data for a compressed versus uncompressed GeoTIFF and associated PDS4 label so we can add this to validate at some point?

msbentley commented 3 years ago

Interesting - will you also be adding similar validation checks for CDFs (uncompressed, not fragmented etc.) or other formats?

thareUSGS commented 3 years ago

I explained a couple methods to check for validity for a TIFF ( https://pds.nasa.gov/datastandards/documents/archiving/ ). But probably the easiest method is by installing GDAL 3.1+ and try a conversion to PDS4. Upon running, GDAL will not list why it failed, but will state: ERROR 1: Source dataset is not compatible of a raw binary format

GeoTIFF test cases ( from LROC Mission Node):

  1. compatible TIFF (1 MB): http://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/EXTRAS/BROWSE/WAC_GLOBAL/WAC_GLOBAL_E000N0000_004P.TIF gdal_translate -of PDS4 -co CREATE_LABEL_ONLY=YES .\WAC_GLOBAL_E000N0000_004P.tif .\WAC_GLOBAL_E000N0000_004P.xml Input file size is 1440, 720 -- only return warnings

BTW, GDAL has a neat trick where it access data from https, ftp and many cloud hosting services (AWS, Google, MS, ...). So without downloading, the above could be run simply using: gdal_translate -of PDS4 -co CREATE_LABEL_ONLY=YES /vsicurl/http://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/EXTRAS/BROWSE/WAC_GLOBAL/WAC_GLOBAL_E000N0000_004P.TIF .\WAC_GLOBAL_E000N0000_004P.xml

  1. incompatible GeoTIFF (deflate compression, <1 MB): http://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/EXTRAS/BROWSE/WAC_GLOBAL/WAC_GLOBAL_E000N0000_004P.MASK.TIF

    gdal_translate -of PDS4 -co CREATE_LABEL_ONLY=YES .\WAC_GLOBAL_E000N0000_004P.MASK.tif .\WAC_GLOBAL_E000N0000_004P.MASK.xml
    Input file size is 1440, 720
    ERROR 1: Source dataset is not compatible of a raw binary format
  2. incompatible GeoTIFF (tiled, 1.7MB): http://pds.lroc.asu.edu/data/LRO-L-LROC-5-RDR-V1.0/LROLRC_2001/EXTRAS/BROWSE/WAC_GLOBAL/WAC_GLOBAL_E000N0000_004P.PYR.TIF

    gdal_translate -of PDS4 -co CREATE_LABEL_ONLY=YES .\WAC_GLOBAL_E000N0000_004P.PYR.tif .\WAC_GLOBAL_E000N0000_004P.PYR.xml
    Input file size is 1440, 720
    ERROR 1: Source dataset is not compatible of a raw binary format

Currently, GDAL can also test a handful of other formats (unfortunately not CDF -- but this could be added). Including GeoTIFF, GDAL will try to test and create a PDS4 label next to a ENVI file/header, PDS3, ISIS3, VICAR, and FITS. https://github.com/OSGeo/gdal/blob/72e1a4d7c96e0381d2d335857697b5f8e1668450/autotest/gdrivers/pds4.py#L1316

more on GDAL's pds4 capabilities: https://gdal.org/drivers/raster/pds4.html (need to add some updates from the last release). https://github.com/USGS-Astrogeology/GDAL_scripts/wiki

msbentley commented 3 years ago

I followed the links and hints in @thareUSGS's paper to write some code in python to do some checks and spit out the datatypes, byte offsets etc. - very useful! I would guess we don't want to make GDAL a dependency of validate? But the basic checks should not be too onerous to integrate into validate, I would imagine.

Though I do wonder how far one needs to go checking external standards? I mean, all of the standard data checks should run anyway (checking values against datatypes, min/max, etc. declared in the label) regardless. I guess this should also catch compressed images since the array size and data type declared in the label will not match the number of bytes expected etc.?

jordanpadams commented 3 years ago

@thareUSGS thanks!

@msbentley duh! It isn't compressed, I can't imagine how you. describe it using PDS4 so that should pretty much cover it.

I will punt this for now, and hopefully when we get some example GeoTiff data in the archive, that will help this be tested.

thareUSGS commented 3 years ago

We have several archives lined up now to be released soon, which required the needed updates for 1.F.0.0 to support PDS4 GeoTIFFs. There are some pre-release labels "PDS4-like" labels since they haven't been through review yet. https://astrogeology.usgs.gov/search/map/Enceladus/enceladus_cassini_iss_shapemodel_bland_2019/enceladus_2019pm_topography
they will need a little touch-up once 1.F.0.0 is official also.

@msbentley I would love to see that Python code. A simple TIFF header parser for some crucial bits would be helpful to have. Yes - I agree that having GDAL as a dependency of validate is probably not a great idea so your Python code could be a good place to start.