bopen / xarray-sentinel

Xarray backend to Copernicus Sentinel-1 satellite data products
Apache License 2.0
221 stars 22 forks source link

Expose burst level data via GDAL / rasterio / rioxarray #9

Closed alexamici closed 3 years ago

alexamici commented 3 years ago

The lowest level in exposing the data is to create bursts datasets with the data contained in the measurements TIFF files (real and imaginary part scaled to int16 aa "distribution" digital number, not the original DN).

The step needed are:

  1. parse burst information from the product annotation
  2. build the burst window into the original TIFF file (line_min, sample_min, line_max, sample_max)
  3. map the burst data to a xr.Dataset

Step 3. can be done quickly by creating a VRT on disk and opening it with rioxarray.

Longer term we wan to support products on read-only file-systems or zip files, but there is no way (that I found) to pass the VRT XML to GDAL without saving it first to a file.

alexamici commented 3 years ago

One issue with VRT and GDAL in general is that it looks like the GeoTransform doesn't accept the UTC time-like coordinates (or maybe I didn't look for it long enough) that we use for azimuth_time.

Also the name for the dimensions need to be encoded outside of the VRT.

Both issues can be worked around in a post-processing step:

def open_burst_dataset(...):
    burst_ds = rioxarray.open_rasterio(vrt_path)
    burst_ds.assign_coords(azimuth_time=("y", azimuth_time))
    burst_ds = burst_ds.rename_dims(x=slant_range_time)
    burst_ds.swap_dims(y=azimuth_time)
corrado9999 commented 3 years ago

With GDAL creating an in-memory VRT for a burst should be as simple as:

gdal.Translate("/vsimem/unique_name.vrt",
               input_tif,
               format="VRT",
               srcWin=(sample_min, line_min, sample_max, line_max),
)

In this way, even if you close the dataset, you will still have the virtual file in memory, but I do not know how well GDAL and rasterio play together.

alexamici commented 3 years ago

After reviewing the details with @aurghs I think the limitation of the VRT approach identified above are a killer.

Much simpler to just:

def open_burst_dataset(...):
    # compute burst properties
    swath_ds = rioxarray.open_rasterio(swath_path)
    burst_ds = swath_ds.isel(y=slice(linesPerBurst * burst_count, linesPerBurst * (burst_count + 1))
    # attach_coords
    # swap_dims
    return burst_ds

This is lazy and plays well with dask.

The low level access to the TIFF data is as efficient as with GDAL. That is, it may depend on the TIFF file (compression, tiles, etc).