Setting GDAL creation options

maxfreu commented 2 years ago

Hi! I would like to save my L2 processing output as DEFLATE compressed tifs with a block size of 256x256 (instead of one wide strip) and horizontal differencing. Is that possible, and if yes, how?

davidfrantz commented 2 years ago

Hi @maxfreu,

out-of-the-box, it currently is not possible to achieve this.

However, if changing some lines of code, and re-compiling is an option for you:

in src/cross-level/brick-cl.c, ll. 740ff, you can add or change the GDAL creation options and make a custom build.

Hope this helps, David

    case _FMT_GTIFF_:
      driver = GDALGetDriverByName("GTiff");
      options = CSLSetNameValue(options, "COMPRESS", "LZW");
      options = CSLSetNameValue(options, "PREDICTOR", "2");
      options = CSLSetNameValue(options, "INTERLEAVE", "BAND");
      options = CSLSetNameValue(options, "BIGTIFF", "YES");
      if (brick->cx > 0){
        nchar = snprintf(xchunk, NPOW_08, "%d", brick->cx);
        if (nchar < 0 || nchar >= NPOW_08){ 
          printf("Buffer Overflow in assembling BLOCKXSIZE\n"); return FAILURE;}
        options = CSLSetNameValue(options, "BLOCKXSIZE", xchunk);
      }
      if (brick->cy > 0){
        nchar = snprintf(ychunk, NPOW_08, "%d", brick->cy);
        if (nchar < 0 || nchar >= NPOW_08){ 
          printf("Buffer Overflow in assembling BLOCKYSIZE\n"); return FAILURE;}
        options = CSLSetNameValue(options, "BLOCKYSIZE", ychunk);
      }
      break;

maxfreu commented 2 years ago

Yes, that helps, thank you! However, it might make sense to expose these options to the user, as such basic things dramatically impact the access speed.

davidfrantz commented 2 years ago

Depends on what you are doing and how you access the data. For the access pattern that FORCE uses in the higher level processing, the default one is fairly optimal.

In addition, exposing every small parameter has the downside of overwhelming the user...

I will think about this one, though. I will probably implement a specific COG output option, which will kind of give you what you require. It will be fairly close to the default GTiff though. The only major difference is block size, and pixel interleave (although I have some reservations about pixel interleaving as all GDAL reads are band-wise on the lowest programmatical level).

maxfreu commented 2 years ago

I think band-wise interleaving is ok, unless you always read all bands. I just made good experiences with deflate compression, zlevel 1, the horizontal predictor and smaller block sizes - resulting in smaller files (despite low zlevel) and much faster reads. You might want to try it out :)

davidfrantz commented 2 years ago

When using GDAL, it does not really matter if you read all bands or not because the data access is band-based. You first request a band, and then read a spatial subset like this:

  dataset = GDALOpenEx(file, GDAL_OF_READONLY, NULL, NULL, NULL);
  band = GDALGetRasterBand(dataset, 0);
  GDALRasterIO(band, GF_Read, 
      xoff_disc, yoff_disc, nx_disc, ny_disc, 
      read_buf, nx_read, ny_read, GDT_Int16, 0, 0)

Thus, even when reading all bands, each consecutive read (for as many as we have bands) skips bytes if we have pixel-interleave... Eventually, we have read a whole block of consecutive data, but the individual accesses were fragmented nonetheless.

For the compression: much faster? Do you have some benchmark for this? From my experience so far, Deflate and LZW are not too different. See also here: https://kokoalberti.com/articles/geotiff-compression-optimization-guide/

If you want faster access, zstd compression might be worth looking into?

maxfreu commented 2 years ago

Thus, even when reading all bands, each consecutive read (for as many as we have bands) skips bytes if we have pixel-interleave... Eventually, we have read a whole block of consecutive data, but the individual accesses were fragmented nonetheless.

Oh good to know! That sounds inefficient and I wonder why there are no two distinct read modes...

For the compression: much faster? Do you have some benchmark for this?

Unfortunately I didn't write a proper benchmark, but here is one along with code here. I think the majority of the speedup I observed came from the smaller block sizes I used, as I often access much less than a tile's width.

If you want faster access, zstd compression might be worth looking into?

I don't know why I didn't test that, but according to the link above it seems fast! Especially the combination zstd, zstd_level = 1, predictor = 2 looks like an interesting tradeoff between read speed and compression, while still getting almost double the lzw write speed. This, however, depends a lot on the data. Maybe I can re-run above benchmark with some of the tiles stored in the CODE-DE datacube.

davidfrantz commented 2 years ago

The benchmark you sent is the same one I posted above. LZW and DEFLATE are more or less in the same ballpark. But I believe I had some rationale of not using DEFLATE. As far as I remember, DEFLATE didn't work with all software packages.

Do you mean the CODE-DE datacube under /code-de/community/force ? Please note that access to this datacube is still preliminary and that reading speed is still of concern...

maxfreu commented 2 years ago

The benchmark you sent is the same one I posted above.

Oh sorry, seems like I overread it.

Do you mean the CODE-DE datacube under /code-de/community/force ? Please note that access to this datacube is still preliminary and that reading speed is still of concern...

Yes, exactly. I copied some files onto a local SSD for testing. The access speeds there, especially listing stuff, is quite bad indeed. I'll get back to you when the benchmark is done, but it seems like my GDAL comes without ZSTD support...

davidfrantz commented 2 years ago

hmm, another thing you probably need to consider in this case:

Using an SSD as data storage is not realistic in all cases. With SSD, your performance will mostly depend on how fast your CPU can crack the compression - less so on reading throughput. I would also suggest you to additionally measure the performance when parallelly reading multiple files (bc decompression algorithms are usually parallelized, thus when you parallelize elsewhere, i.e. take away cores from the decompressing algo, things might shift).

maxfreu commented 2 years ago

Here is the benchmark carried out with three randomly picked sentinel 2 image tiles from different locations. Numbers are averaged across the three images. The benchmark tests gdal_translate, so the numbers for "write" include a read from uncompressed data on disk and vice versa. Data was read and written to the same HDD (not SSD), the CPU is a Xeon at max 3.7Ghz, only one core was used. Repetition count is 3. If set, the tiling was 256x256 pixels.

I think ZSTD with level 3, horizontal predictor and tiling is the sweet spot. A 20TB datacube with comparable imagery would shrink by around 3TB or 16%, while the read and write speed approximately doubles. The access to small subsets of the data will be sped up even more by the tiled compression.

Certainly there are many more considerations when choosing the compression settings, but if I as a user were given the choice, I'd go for the settings above, because my downstream software & code can handle it.

davidfrantz commented 2 years ago

Dear @maxfreu.

I thought some time about implementing custom GDAL options and decided it would indeed be beneficial for some expert users.

I put some time into making your request possible. You will find a new force branch here: https://github.com/davidfrantz/force/tree/gdal_options

To enable this, quite some substantial changes in the belly of the beast were necessary. These changes should affect the entire FORCE codebase, e.g. force-level2 and force-higher-level. Thus, I would like to treat this branch as a release candidate, but won't merge immediately. It would be really nice if you could do some testing with this - both the custom and built-in format.

There is now a new parameter: FILE_OUTPUT_OPTIONS. This expects a file, and will only be activated when OUTPUT_FORMAT = CUSTOM.

The text file should be styled in tag and value notation like this:

DRIVER = GTiff
EXTENSION = tif
BIGTIFF = YES
COMPRESS = ZSTD
PREDICTOR = 2
ZLEVEL = 1
INTERLEAVE = BAND
TILED = YES
BLOCKXSIZE = 1024
BLOCKYSIZE = 1024

Important: the file needs at least the DRIVER (GDAL short driver name) and EXTENSION, and then a variable number of GDAL options (up to 32 - this should be enough, right?).

Some thoughts of caution: with opening this up to the user, it is now possible to give invalid or conflicting options that result in the failure of creating files.

My first tests with ZSTD are indeed promising! Though I would like to test this more rigorously before making it the default, especially since many GDAL installations might not support this yet. ZSTD was introduced in v. 2.3, which is only available since Ubuntu 20.04 LTS in the main repository. I would also like to test whether this compression works with other image processing software that is commonly used in the field (ENVI, Arc, ERDAS, etc.., - at least QGIS seems to be fine though!).

Cheers, David

maxfreu commented 2 years ago

Wow, sounds great! Thank you :) 32 options sounds enough. I will try it out as soon as possible and report back.

Question: Is force processing at all affected by the block size of images? E.g. are there any statistics calculated block-wise, which would become useless if blocks are too small?

Btw: I started converting our local datacube to zstd, resulting in 19% space savings. Accessing single pixels in the L2 cube for all points in time at once is now 13 times faster with a block size of 128.

davidfrantz commented 2 years ago

Sounds promising. FORCE uses its standard blocks (strips) as processing units for the higher level processing. See here: https://force-eo.readthedocs.io/en/latest/components/higher-level/hl-compute.html

Reading compressed data on block boundaries should be more efficient in theory. But if ZSTD compensates for this, this might indeed be a good alternative. I am currently asking our lab members to open ZSTD with their favorite software - I am curious what the outcome will be :)

davidfrantz commented 2 years ago

I received some quick feedback from colleagues:

it works nicely in QGIS, GrassGIS and ArcGIS
it won't open in ENVI, ERDAS or SNAP

maxfreu commented 2 years ago

Actually I have to report that it dosn't work for me for some reason. When I run force-level2 the per-image logs always tell me "parsing metadata failed", no other errors. Btw: I had to patch the makefile and manually include numpy/ndarrayobject.h, but that's unlikely the source of error I think. So how can we track this down? With the master branch it works.

davidfrantz commented 2 years ago

Do you by any chance try to process a Landsat 9 image? The branch with the GDAL options was forked before the Landsat 9 branch.

numpy: this can happen if your python installation is not in the default path. This is very hard for me to control..

maxfreu commented 2 years ago

Do you by any chance try to process a Landsat 9 image?

No, only Sentinel 2 A/B.

numpy: this can happen if your python installation is not in the default path. This is very hard for me to control.

Default centos installation path, but no worries, after adapting it compiles. However I could imagine that the source of error is some misconfiguration of the machine.

L1 download file:

#!/bin/bash
p="/data_hdd/force"
metadata_dir="$p/metadata"
l1_dir="$p/l1/ukraine"
queue_file="$l1_dir/queue.txt"
aoi="29.9/51.54,29.9/49.9,31.4/49.9,31.4/51.54,29.9/51.54"
t_start="20220101"
t_end="20220302"
echo $l1_dir

force-level1-csd -u -s S2A,S2B $metadata_dir
force-level1-csd -c 0,30 -d $t_start,$t_end -s S2A,S2B -k $metadata_dir $l1_dir $queue_file $aoi

Parameter file:

``` ++PARAM_LEVEL2_START++ # INPUT/OUTPUT DIRECTORIES # ------------------------------------------------------------------------ # The file queue specifies, which images are to be processed. The full path # to the file needs to be given. Do not paste the content of the file queue # into the parameter file. The file queue is mandatory for force-level2, but # may be NULL for force-l2ps. # Type: full file path FILE_QUEUE = /data_hdd/force/l1/ukraine/queue.txt # This is the output directory where the Level 2 data will be stored. Note # that data will be overwritten/mosaicked if you reprocess images. It is # safe and recommended to use a single Level 2 data pool for different # sensors (provided the same grid and projection is used). The higher-level # programs of FORCE can handle different spatial resolutions (e.g. 30m # Landsat and 10m Sentinel-2). # Type: full directory path DIR_LEVEL2 = /data_hdd/force/l2/ukraine # This is the directory where logfiles should be saved. # Type: full directory path DIR_LOG = /data_hdd/force/log # This is a temporary directory that is used to extract compressed images # for force-level2. Note that images already need to be extracted when using # force-l2ps directly. The extracted data will be deleted once they were # processed. If you cancel processing, you may want to delete any left-overs # in this directory. A file 'cpu-$TIME' is temporarily created in DIR_TEMP. # This file can be modified to re-adjust the number of CPUs while(!) force- # level2 is running. Note that the effect is not immediate, as the load is # only adjusted after one of the running jobs (images) is finished. # Type: full directory path DIR_TEMP = /tmp # DIGITAL ELEVATION MODEL # ------------------------------------------------------------------------ # This file specifies the DEM. It is highly recommended to use a DEM. It is # used for cloud / cloud shadow detection, atmospheric correction and topo- # graphic correction. The DEM should be a mosaic that should completely cover # the area you are preprocessing. If there are nodata values in the DEM, the # Level 2 outputs will have holes, too. It is possible to process without a # DEM (DEM = NULL). In this case, the surface is assumed flat @ z = 0m. # Topographic correction cannot be used without a DEM. The quality of atmo- # spheric correction and cloud /cloud shadow detection will suffer without # a DEM. # Type: full file path FILE_DEM = NULL # Nodata value of the DEM. # Type: Integer. Valid range: [-32768,32767] DEM_NODATA = -32767 # DATA CUBES # ------------------------------------------------------------------------ # This indicates whether the images should be reprojected to the target # coordinate system or if they should stay in their original UTM projection. # If you want to work with force-higher-level routines, give TRUE. # Type: Logical. Valid values: {TRUE,FALSE} DO_REPROJ = TRUE # This indicates whether the images should be gridded after processing. # If TRUE, sub-directories for the tiles are generated in DIR_LEVEL2. # If FALSE, sub-directories for the original spatial reference systems # are generated in DIR_LEVEL2. If you want to work with force-higher-level # routines, give TRUE. # Type: Logical. Valid values: {TRUE,FALSE} DO_TILE = TRUE # This is the tile allow-list. It is an optional file that holds all tiles # that should be output. Tiles, which are not specified in this file are # not written to disc. This paremeter is ignored if DO_TILE = FALSE. # If no tile allow-list should be used, give FILE_TILE = NULL, in which # case all tiles are output. # Type: full file path FILE_TILE = NULL # This is the tile size (in target units, commonly in meters) of the # gridded output. tiles are square; not used if DO_TILE = FALSE. # Type: Double. Valid range: ]0,... TILE_SIZE = 30000 # This is the block size (in target units, commonly in meters) of the # image chips. Blocks are stripes, i.e. they are as wide as the tile, # and as high as specified here; not used if DO_TILE = FALSE or # OUTPUT_FORMAT = ENVI. The blocks are the primary processing unit of # the force-higher-level routines. # Type: Double. Valid range: ]0,TILE_SIZE] BLOCK_SIZE = 3000 # This is the spatial resolution of Landsat output; not used if DO_REPROJ # = FALSE. Note that the tile and block sizes must be a multiple of the # pixel resolution. # Type: Double. Valid range: ]0,... RESOLUTION_LANDSAT = 30 # This is the spatial resolution of Sentinel-2 output; not used if DO_REPROJ # = FALSE. Note that the tile and block sizes must be a multiple of the # pixel resolution. # Type: Double. Valid range: ]0,... RESOLUTION_SENTINEL2 = 10 # These are the origin coordinates of the grid system in decimal degree # (negative values for West/South). The upper left corner of tile # X0000_Y0000 represents this point. It is a good choice to use a coord- # inate that is North-West of your study area – to avoid negative tile # numbers. Not used if DO_TILE = FALSE. # Type: Double. Valid range: [-90,90] # Type: Double. Valid range: [-180,180] ORIGIN_LON = 25 ORIGIN_LAT = 60 # This defines the target coordinate system. If DO_REPROJ = FALSE, the # projection string can be NULL. The coordinate system must either be # given as WKT string - or can be a predefined coordinate/grid system. # If one of the predefined systems are used, TILE_SIZE, BLOCK_SIZE, # ORIGIN_LAT, and ORIGIN_LON are ignored and internally replaced with # predefined values. Currently, EQUI7 and GLANCE7 are availble. Both # are globally defined sets of projections with a corresponding grid # system. EQUI7 consists of 7 Equi-Distant, continental projections, # with a tile size of 100km. GLANCE7 consists of 7 Equal-Area, conti- # nental projections, with a tile size of 150km. One datacube will be # generated for each continent. # Type: Character. Valid values: {,EQUI7,GLANCE7} PROJECTION = PROJCS["WGS 84 / UTM zone 36N", GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.0174532925199433, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",33], PARAMETER["scale_factor",0.9996], PARAMETER["false_easting",500000], PARAMETER["false_northing",0], UNIT["metre",1, AUTHORITY["EPSG","9001"]], AXIS["Easting",EAST], AXIS["Northing",NORTH], AUTHORITY["EPSG","32636"]] # This is the resampling option for the reprojection; you can choose # between Nearest Neighbor (NN), Bilinear (BL) and Cubic Convolution # (CC); not used if DO_REPROJ = FALSE. # Type: Character. Valid values: {NN,BL,CC} RESAMPLING = BL # RADIOMETRIC CORRECTION OPTIONS # ------------------------------------------------------------------------ # This indicates if topographic correction should be performed. If TRUE, # a DEM need to be given. # Type: Logical. Valid values: {TRUE,FALSE} DO_ATMO = TRUE # This indicates if atmospheric correction should be performed. If TRUE, # Bottom-of-Atmosphere reflectance is computed. If FALSE, only Top-of-Atmo- # sphere reflectance is computed. # Type: Logical. Valid values: {TRUE,FALSE} DO_TOPO = FALSE # This indicates if BRDF correction should be performed. If TRUE, output is # nadir BRDF adjusted reflectance instead of BOA reflectance (the output is # named BOA nonetheless). # Type: Logical. Valid values: {TRUE,FALSE} DO_BRDF = TRUE # This indicates if adjacency effect correction should be performed. # Type: Logical. Valid values: {TRUE,FALSE} ADJACENCY_EFFECT = TRUE # This indicates if multiple scattering (TRUE) or the single scattering # approximation (FALSE) should be used in the radiative transfer calculations. # Type: Logical. Valid values: {TRUE,FALSE} MULTI_SCATTERING = TRUE # WATER VAPOR CORRECTION OPTIONS # ------------------------------------------------------------------------ # This is the directory where the water vapor tables are located. Water # vapor tables are not required for Sentinel-2, in this case DIR_WVPLUT # may be NULL. For Landsat, it is recommended to use this functionality. # As a minimum requirement, DIR_WVPLUT may be NULL and a global value # for WATER_VAPOR needs to be specified. If a directory is given, # WATER_VAPOR is ignored. DIR_WVPLUT must contain water vapor tables. # The 12 climatology tables must exist at least. They are used if the # daily tables do not exist or if there is no valid daily value. # Type: full directory path DIR_WVPLUT = NULL # This specifies a global value for atmospheric water vapor content in # g cm-2. This parameter can be a dummy value to quickly process an image # without needing to generate a water vapor database. Note that especially # Landsat-8 is relatively insensitive to atmospheric water vapor (depending # on wavelength), and external water vapor is not needed to process # Sentinel-2. The error in using a dummy value is significant for the TM # sensors. # Type: Float. Valid range: [0,15] WATER_VAPOR = NULL # AEROSOL OPTICAL DEPTH OPTIONS # ------------------------------------------------------------------------ # This indicates whether the internal AOD estimation (TRUE) or externally # generated AOD values should be used (FALSE). # Type: Logical. Valid values: {TRUE,FALSE} DO_AOD = TRUE # This is the directory where the aerosol optical depth look-up-tables are # located. They can be used to input external AOD values. It is recom- # mended to use the internal algorithm only. If a path is given, and # DO_ATMO = TRUE, internal AOD estimation is used and external AOD values # are used as fallback option. # Type: full directory path DIR_AOD = NULL # CLOUD DETECTION OPTIONS # ------------------------------------------------------------------------ # If this parameter is enabled, confident cloud detections will be erased in the # reflectance product, i.e. pixels are set to nodata. The cloud flag in the QAI # product will still mark these pixels as clouds. Use this option is disk space # is of concern. # Type: Logical. Valid values: {TRUE,FALSE} ERASE_CLOUDS = FALSE # This parameter cancels the processing of images that exceed the given # threshold. The processing will be canceled after cloud detection. # Type: Integer. Valid range: ]0,100] MAX_CLOUD_COVER_FRAME = 75 # This parameter works on a tile basis. It suppresses the output for chips # (tiled image) that exceed the given threshold. # Type: Integer. Valid range: ]0,100] MAX_CLOUD_COVER_TILE = 75 # Buffer sizes (radius in meters) for cloud, cloud shadow and snow masks. # Type: Float. Valid range: [0,10000] CLOUD_BUFFER = 300 SHADOW_BUFFER = 90 SNOW_BUFFER = 30 # These are the main thresholds of the Fmask algorithm. # Type: Float. Valid range: [0,1] CLOUD_THRESHOLD = 0.225 SHADOW_THRESHOLD = 0.02 # RESOLUTION MERGING # ------------------------------------------------------------------------ # This parameter defines the method used for improving the spatial reso- # lution of Sentinel-2’s 20 m bands to 10 m. Pixels flagged as cloud or # shadow will be skipped. Following methods are available: IMPROPHE uses # the ImproPhe code in a spectral-only setup; REGRESSION uses a multi- # parameter regression (results are expected to be best, but processing # time is significant); STARFM uses a spectral-only setup of the Spatial # and Temporal Adaptive Reflectance Fusion Model (prediction artifacts # may occur between land cover boundaries); NONE disables resolution merge; # in this case, 20m bands are quadrupled. # Type: Character. Valid values: {IMPROPHE,REGRESSION,STARFM,NONE} RES_MERGE = IMPROPHE # CO-REGISTRATION OPTIONS # ------------------------------------------------------------------------ # This parameter only applies for Sentinel-2 data. This parameter defines # the path to a directory that contains monthly Landsat NIR base images. # If given, a co-registration is attempted. If it fails (no tie points), # the image won't be processed. # Type: full directory path DIR_COREG_BASE = NULL # This parameter defines the nodata values of the coregistration base images. # Type: Integer. Valid values: [-32768,32767] COREG_BASE_NODATA = -9999 # MISCELLANEOUS OPTIONS # ------------------------------------------------------------------------ # This parameter defines if impulse noise should be removed. Ony applies # to 8bit input data. # Type: Logical. Valid values: {TRUE,FALSE} IMPULSE_NOISE = TRUE # This parameter defines if nodata pixels should be buffered by 1 pixel. # Type: Logical. Valid values: {TRUE,FALSE} BUFFER_NODATA = FALSE # TIER LEVEL # ------------------------------------------------------------------------ # This parameter specifies the acceptable tier level of Landsat Level 1 data. # For pre-collection data, TIER = 1 will only accept L1T images, TIER = 2 # will also accept L1Gt and L1G images. For collection data, TIER = 1 will # only accept L1TP images, TIER = 2 will also accept T2 images, TIER = 3 # will additionaly accept RT images. # Type: Integer. Valid range: [1,3] TIER = 1 # PARALLEL PROCESSING # ------------------------------------------------------------------------ # Multiprocessing options (NPROC, DELAY) only apply when using the batch # utility force-level2. They are not used by the core function force-l2ps. # ------------------------------------------------------------------------ # This module is using hybrid parallelization, i.e. a mix of multiprocessing # and multithreading. Each input image is one process, each process may use # multiple threads. In general, it is most efficient to use as much multi- # processing as possible (i.e. NTHREAD = 1 or 2). However, if you only have # a small number of images - or if your system does not have enough RAM, # it is adviced to use less processes and more threads per process. The # number of processes and threads is given by following parameters. # Type: Integer. Valid range: [1,... NPROC = 5 NTHREAD = 2 # This parameter controls whether the individual bands of the Level 1 input # images are read sequentially or in parallel. Note that we observed two kinds # of GDAL installation: (1) the JPEG driver reads each band parallely, but # separated images in sequence - we recommend to disable PARALLEL_READS in this # case (for Sentinel-2). (2) The GDAL JPEG drived does not do anything in # parallel - use PARALLEL_READ to speed up the work (also use it for Landsat). # Type: Logical. Valid values: {TRUE,FALSE} PARALLEL_READS = FALSE # This parameter sets a delay before starting a new process. This can be help- # ful to prevent I/O jams when using a lot of processes. The delay is given # in seconds. # Type: Integer. Valid range: [0,... DELAY = 3 # This parameter sets a timeout for unzipping the Level 1 data (only applies when # images are still in zip/tar.gz format. Only applies for force-level2). # The timeout is given in seconds. # Type: Integer. Valid range: [0,... TIMEOUT_ZIP = 30 # OUTPUT OPTIONS # ------------------------------------------------------------------------ # Output format, which is either uncompressed flat binary image format aka # ENVI Standard, GeoTiff, or COG. GeoTiff images are compressed with LZW and hori- # zontal differencing; BigTiff support is enabled; the Tiff is structured # with striped blocks according to the TILE_SIZE (X) and BLOCK_SIZE (Y) speci- # fications. Metadata are written to the ENVI header or directly into the Tiff # to the FORCE domain. If the size of the metadata exceeds the Tiff's limit, # an external .aux.xml file is additionally generated. # Type: Character. Valid values: {ENVI,GTiff,COG} OUTPUT_FORMAT = GTiff # Output the cloud/cloud shadow/snow distance output? Note that this is NOT # the cloud mask (which is sitting in the mandatory QAI product). This pro- # duct can be used in force-level3; no other higher-level FORCE module is # using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_DST = FALSE # Output Aerosol Optical Depth map for the green band? No higher-level FORCE # module is using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_AOD = FALSE # Output the Water Wapor map? No higher-level FORCE module is using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_WVP = FALSE # Output the view zenith map? This product can be used in force-level3; no # other higher-level FORCE module is using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_VZN = FALSE # Output the Haze Optimzed Transformation output? This product can be # used in force-level3; no other higher-level FORCE module is using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_HOT = FALSE # Output overview thumbnails? These are jpegs at reduced spatial resolution, # which feature an RGB overview + quality information overlayed (pink: cloud, # red: cirrus, cyan: cloud shadow, yellow: snow, orange: saturated, green: # subzero reflectance). No higher-level FORCE module is using this. # Type: Logical. Valid values: {TRUE,FALSE} OUTPUT_OVV = TRUE FILE_OUTPUT_OPTIONS = /data_hdd/force/output_options.txt ++PARAM_LEVEL2_END++ ```

Output options file:

DRIVER = GTiff
EXTENSION = tif
BIGTIFF = YES
COMPRESS = ZSTD
PREDICTOR = 2
ZLEVEL = 3
INTERLEAVE = BAND
TILED = YES
BLOCKXSIZE = 128
BLOCKYSIZE = 128

Would compiling in debug mode output more info?

davidfrantz commented 2 years ago

"parsing metadata failed" means that there is some error in reading the input image's metadata. This is likely not related to the parameterization.

In this case, running FORCE in debug mode might be a good idea indeed. If you do, please do not use force-level2, but rather use force-l2ps on a single image that throws this error message.

maxfreu commented 2 years ago

When I run force-l2ps ./l1/ukraine/T35UPS ./param/param-ukraine.prm, I get following error message in debug mode:

<param file content>
check that all meta is initialized, brick as well?
there are still some things to do int meta. checking etc
unknown Satellite Mission. Parsing metadata failed.

When I run force-l2ps ./l1/ukraine/T35UPS/S2A_MSIL1C_20220222T092031_N0400_R093_T35UPS_20220222T113539.SAFE/ ./param/param-ukraine.prm I get:

expand

``` ... ++PARAM_LEVEL2_END++ check that all meta is initialized, brick as well? there are still some things to do int meta. checking etc Mission: 1 reading Sentinel-2 metadata top-level metadata: ./l1/ukraine/T35UPS/S2A_MSIL1C_20220222T092031_N0400_R093_T35UPS_20220222T113539.SAFE/MTD_MSIL1C.xml Start of RSR array: 30 additive scaling factor for band 0: -1000.000000 additive scaling factor for band 1: -1000.000000 additive scaling factor for band 2: -1000.000000 additive scaling factor for band 3: -1000.000000 additive scaling factor for band 4: -1000.000000 additive scaling factor for band 5: -1000.000000 additive scaling factor for band 6: -1000.000000 additive scaling factor for band 7: -1000.000000 additive scaling factor for band 8: -1000.000000 additive scaling factor for band 9: -1000.000000 additive scaling factor for band 10: -1000.000000 additive scaling factor for band 11: -1000.000000 additive scaling factor for band 12: -1000.000000 granule-level metadata: ./l1/ukraine/T35UPS/S2A_MSIL1C_20220222T092031_N0400_R093_T35UPS_20220222T113539.SAFE/GRANULE/L1C_T35UPS_A034839_20220222T092341/MTD_TL.xml active image subset: UL 0/0 to LR 4500/10980 coarse cells: nx/ny 9/22 DN: T35UPS_20220222T092031_B01.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B02.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B03.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B04.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B05.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B06.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B07.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B08.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B8A.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B09.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B10.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B11.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 DN: T35UPS_20220222T092031_B12.jp2 LMAX/LMIN -32768.00/-32768.00, QMAX/QMIN -32768.00/-32768.00, R*/R+ -32768.00000/-1000.00, K1/K2 -32768.00/-32768.00 dtype: 16 sat: 65535 refsys = T35UPS tier: 1 brick info for FORCE Digital Number brick - DN_ - SID -1 open: 0, explode 0 GDAL output options ::: Driver: GTiff Extension: tif Option 0: COMPRESS = LZW Option 2: PREDICTOR = 2 Option 4: INTERLEAVE = BAND Option 6: BIGTIFF = YES datatype 5 with 0 bytes filename: /tmp/DIGITAL-NUMBERS nx: 4500, ny: 10980, nc: 49410000, res: 10.000, nb: 13 width: 45000.0, height: 109800.0 chunking: nx: 0, ny: 0, nc: 0, width: 0.0, height: 0.0, #: 0 active chunk: -1, tile X0000_Y0000 ulx: 600000.000, uly: 5700000.000 geotran: 600000.000, 10.000, 0.000, 5700000.000, 0.000, -10.000 proj: PROJCS["WGS 84 / UTM zone 35N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",27],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32635"]] par: DIR_LEVEL2: /data_hdd/force/l2/ukraine, DIR_TEMP: /tmp, DIR_LOG: /data_hdd/force/log, FILE_QUEUE: /data_hdd/force/l1/ukraine/queue.txt, DIR_WVPLUT: NULL, DIR_AOD: NULL, FILE_TILE: NULL, FILE_DEM: NULL, DIR_COREG_BASE: NULL, TILE_SIZE: 30000.000000, BLOCK_SIZE: 3000.000000, RESOLUTION_LANDSAT: 30.000000, RESOLUTION_SENTINEL2: 10.000000, DO_REPROJ: 1, DO_TILE: 1, ORIGIN_LAT: 60.000000, ORIGIN_LON: 25.000000, PROJECTION: PROJCS["WGS 0��84 0��/ 0��UTM 0��zone 0��36N", 0��GEOGCS["WGS 0��84", 0��DATUM["WGS_1984", 0��SPHEROID["WGS 0��84",6378137,298.257223563, 0��AUTHORITY["EPSG","7030"]], 0��AUTHORITY["EPSG","6326"]], 0��PRIMEM["Greenwich",0, 0��AUTHORITY["EPSG","8901"]], 0��UNIT["degree",0.0174532925199433, 0��AUTHORITY["EPSG","9122"]], 0��AUTHORITY["EPSG","4326"]], 0��PROJECTION["Transverse_Mercator"], 0��PARAMETER["latitude_of_origin",0], 0��PARAMETER["central_meridian",33], 0��PARAMETER["scale_factor",0.9996], 0��PARAMETER["false_easting",500000], 0��PARAMETER["false_northing",0], 0��UNIT["metre",1, 0��AUTHORITY["EPSG","9001"]], 0��AXIS["Easting",EAST], 0��AXIS["Northing",NORTH], 0��AUTHORITY["EPSG","32636"]], RESAMPLING: 1, RES_MERGE: 2, TIER: 1, DO_TOPO: 0, DO_ATMO: 1, DO_AOD: 1, DO_BRDF: 1, MULTI_SCATTERING: 1, ADJACENCY_EFFECT: 1, WATER_VAPOR: 0.000000, IMPULSE_NOISE: 1, BUFFER_NODATA: 0, DEM_NODATA: -32767, COREG_BASE_NODATA: -9999, ERASE_CLOUDS: 0, MAX_CLOUD_COVER_FRAME: 75.000000, MAX_CLOUD_COVER_TILE: 75.000000, CLOUD_BUFFER: 300.000000, SHADOW_BUFFER: 90.000000, SNOW_BUFFER: 30.000000, CLOUD_THRESHOLD: 0.225000, SHADOW_THRESHOLD: 0.020000, NPROC: 5, NTHREAD: 2, PARALLEL_READS: 0, DELAY: 3, TIMEOUT_ZIP: 30, FILE_OUTPUT_OPTIONS: /data_hdd/force/output_options.txt, OUTPUT_FORMAT: 1, OUTPUT_DST: 0, OUTPUT_AOD: 0, OUTPUT_WVP: 0, OUTPUT_VZN: 0, OUTPUT_HOT: 0, OUTPUT_OVV: 1 ++band # 0 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.442695, domain: ULTRABLUE, band name: ULTRABLUE, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 1 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.492437, domain: BLUE, band name: BLUE, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 2 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.559849, domain: GREEN, band name: GREEN, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 3 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.664622, domain: RED, band name: RED, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 4 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.704115, domain: REDEDGE1, band name: REDEDGE1, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 5 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.740492, domain: REDEDGE2, band name: REDEDGE2, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 6 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.782753, domain: REDEDGE3, band name: REDEDGE3, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 7 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.832790, domain: BROADNIR, band name: BROADNIR, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 8 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.864711, domain: NIR, band name: NIR, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 9 - save 1, nodata: 0, scale: 10000.000000 wvl: 0.945054, domain: VAPOR, band name: VAPOR, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 10 - save 1, nodata: 0, scale: 10000.000000 wvl: 1.373462, domain: CIRRUS, band name: CIRRUS, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 11 - save 1, nodata: 0, scale: 10000.000000 wvl: 1.613659, domain: SWIR1, band name: SWIR1, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z ++band # 12 - save 1, nodata: 0, scale: 10000.000000 wvl: 2.202367, domain: SWIR2, band name: SWIR2, sensor ID: SEN2A Date: 2022-02-22 (053/08/738083) 09:20:31-00Z Parsing metadata failed. ```

Next week I'll maybe try it on another machine.

davidfrantz commented 2 years ago

The second call is correct. I will need to look at this image and check whether I can reproduce this here...

davidfrantz commented 2 years ago

yep, there was a tiny bug indeed. Can you try again? It should work now

maxfreu commented 2 years ago

Well, now it does something with the above output options, but there still seems to be an error with tiling:

force-l2ps ./l1/ukraine/T36UUA/S2A_MSIL1C_20220213T085051_N0400_R107_T36UUA_20220213T094348.SAFE/ ./param/param-ukraine.prm

dc:  32.14%. wc:   3.81%. sc:   6.63%. cc:   0.59%. AOD: 0.1211. # of targets: 92/540.
Error creating file /data_hdd/force/l2/ukraine/X0011_Y0036/20220213_LEVEL2_SEN2A_QAI.tif. Error creating file /data_hdd/force/l2/ukraine/X0011_Y0036/20220213_LEVEL2_SEN2A_BOA.tif.
...
error in cubing Level 2 products    
Tiling images failed! Error in geometric module.

Only lock files are produced.

davidfrantz commented 2 years ago

Custom GDAL options used?

With opening up the settings, there is no guarantee anymore that files can be created... I have observed this when playing around with the block sizes. Not all were accepted by GDAL.

maxfreu commented 2 years ago

I used these exact settings with gdal_translate. But wait: I think now the other issue with the custom compilation kicks in. I forgot that the system gdal does not support ZSTD, so I have to link my conda env during compilation... Sorry for the noise.

maxfreu commented 2 years ago

Ok, now I got it working. The files turn out to have exactly the desired settings. A bit unexpected, but the processing time went up for me, compared to LZW (150s vs 120s on avg). I expect this is due to the small tile size of 128 pixels.

So essentially I can confirm this is good to go.

davidfrantz commented 2 years ago

FYI: https://twitter.com/EvenRouault/status/1501903485691961344?t=0HtdZwfjWK9GRqsPZ4T2yQ&s=03

davidfrantz commented 2 years ago

I just merged the branch into develop. This feature is now officially available and will sip into the main branch soon, too.

Thanks for your input again! David

davidfrantz / force

Setting GDAL creation options #166