Closed rsignell-usgs closed 1 year ago
Basic TIFFs are already handled by TIFFFile - so we could just need to analyse the coordinates correctly. Just!
When mosaicing, are the pieces guaranteed to form a contiguous block with no overlaps?
@martindurant: The data set is based on 1x1 degree tiles with geotiffs that are not overlapping, covering all land and ice masses, but not water areas, i.e. there are gaps from non-exisiting files, the VRTs do work globally though.
gaps from non-exisiting files
This is not a problem for a zarr-based interface, missing files just become missing values like nan.
Great. So it would just (!) take a vrt2zarr tool that is just (!) adding a relevant metadata description using the existing geotiffs, right? Just of course emphasized ...
Caveat that I don't know how VRTs work; but in as much as they are pointers to file (pieces) in other places, yes. The trick is to figure out the zarr key for each data chunk.
Maybe going via VRTs is more of a hurdle than helping. The basic structure is a grouping of geotiffs that all have the same organization (datatype, compression, xy-coordinate system) and only differ in their spatial extent individually, yet merge seamlessly. Take a look at http://sentinel-1-global-coherence-earthbigdata.s3-website-us-west-2.amazonaws.com how tiles are organized in the data set on s3.
OK, so what we actually need, is to scan each file, presumably with TIFFFile, find the data portion as normal, and be sure to extract the correct coordinate information (position and time). Then we combine everything into one big aggregate dataset.
Seems to be straightforward. Caveat is the assignment of a time coordinate. There are four seasons (northern hemisphere winter, spring, summer, fall) that could be assigned a time coordinate. In each season we would have a set of dataset variables, some of which are only covered in certain parts of the globe or at certain times of the year. But I guess that could be all captured in the metadata.
We would end up with a time coordinate that is the union of all the available times, and accessing variable/time combinations that are not covers would result in NaNs.
there are only 4 available times, given by the filenames as winter, spring, summer, fall. I guess one would pick a middate or just don't make it a time coordinate, and leave it as ordered labeled bands? That's what is done via the VRTs now.
As the data curator who understand the original data better than me, the choice would be yours! I would tend towards changing the original labels and assumptions as little as possible.
I would agree. Would you have any examples on how to go about this or might be interested in tackling this togehter? I think we really don't need to go via VRTs but can work from the tiffs themselves. Total number of geotiffs in the data set 1,034,038 in ~25,000 1x1 degree tiles for a total volume of 2.1 TB. Sure would be super cool to just open one "zarr" file and use xarray and dask magic to analyze the data.
So the first thing we need is the bytes range for a file using tifffile and see if that references output contains all the information we need (bandpass, time, location coordinates).
ok. I am installing it. Here is an example file I am using: https://sentinel-1-global-coherence-earthbigdata.s3.us-west-2.amazonaws.com/data/tiles/N42W090/N42W090_fall_vv_COH12.tif
Seems like a good output:
tifffile N42W090_fall_vv_COH12.tif
Reading TIFF header: 0.002332 s
Reading image data: 0.013988 s
Generating report: 0.069484 s
TiffFile 'N42W090_fall_vv_COH12.tif' 899.52 KiB geotiff
TiffPageSeries 0 1200x1200 uint8 YX Generic 1 Pages
TiffPage 0 @8 1200x1200 uint8 minisblack lzw geotiff
TiffTag 256 ImageWidth @10 SHORT @18 1200
TiffTag 257 ImageLength @22 SHORT @30 1200
TiffTag 258 BitsPerSample @34 SHORT @42 8
TiffTag 259 Compression @46 SHORT @54 LZW
TiffTag 262 PhotometricInterpretation @58 SHORT @66 MINISBLACK
TiffTag 273 StripOffsets @70 LONG[200] @1030 (1990, 6467, 10995, 15556, 20128,
TiffTag 277 SamplesPerPixel @82 SHORT @90 1
TiffTag 278 RowsPerStrip @94 SHORT @102 6
TiffTag 279 StripByteCounts @106 LONG[200] @230 (4477, 4528, 4561, 4572, 4473,
TiffTag 284 PlanarConfiguration @118 SHORT @126 CONTIG
TiffTag 317 Predictor @130 SHORT @138 NONE
TiffTag 339 SampleFormat @142 SHORT @150 UINT
TiffTag 33550 ModelPixelScaleTag @154 DOUBLE[3] @1830 (0.00083333333, 0.0008333
TiffTag 33922 ModelTiepointTag @166 DOUBLE[6] @1854 (0.0, 0.0, 0.0, -90.0, 42.0
TiffTag 34735 GeoKeyDirectoryTag @178 SHORT[32] @1902 (1, 1, 0, 7, 1024, 0, 1,
TiffTag 34736 GeoDoubleParamsTag @190 DOUBLE[2] @1966 (298.257223563, 6378137.0
TiffTag 34737 GeoAsciiParamsTag @202 ASCII[8] @1982 WGS 84|
TiffTag 42113 GDAL_NODATA @214 ASCII[2] @222 0
StripOffsets
(1990, 6467, 10995, 15556, 20128, 24601, 28997, 33372, 37927, 42281, 46529,
50958, 55479, 59944, 64505, 68906, 73437, 77971, 82459, 87104, 91871, 96363,
100870, 105256, 109544, 113898, 118374, 122791, 127370, 131910, 136443,
140998, 145696, 150448, 155111, 159657, 164259, 168891, 173718, 178453,
183307, 188071, 192799, 197584, 202241, 206883, 211590, 216310, 221048,
225633, 229962, 234406, 238930, 243581, 248090, 252536, 257180, 261840,
266341, 270829, 275333, 279866, 284370, 288756, 293301, 297760, 302201,
306732, 311280, 315925, 320654, 325313, 330012, 334624, 339191, 343863,
348405, 352962, 357547, 362147, 366711, 371311, 375928, 380635, 385238,
389747, 394263, 398962, 403793, 408743, 413549, 418221, 422862, 427495,
432058, 436550, 441183, 445641, 450069, 454557, 459184, 463675, 468041,
472380, 476880, 481366, 485863, 490363, 494807, 499332, 503928, 508408,
512905, 517493, 522036, 526593, 531154, 535624, 540178, 544762, 549404,
554042, 558815, 563600, 568347, 573184, 578128, 583201, 588167, 593070,
597966, 602817, 607612, 612506, 617519, 622713, 627673, 632096, 636533,
640934, 645497, 650114, 654936, 659542, 663928, 668422, 672848, 677331,
681927, 686604, 691246, 695939, 700526, 705032, 709512, 714010, 718579,
723044, 727524, 732030, 736530, 741006, 745644, 750321, 754825, 759433,
763937, 768528, 773160, 777819, 782481, 787117, 791830, 796496, 801272,
806015, 810704, 815397, 820134, 824916, 829401, 834007, 838816, 843476,
847961, 852609, 857182, 861620, 866219, 870723, 875230, 879697, 884230,
888899, 893595, 898158, 902629, 907180, 911743, 916379)
StripByteCounts
(4477, 4528, 4561, 4572, 4473, 4396, 4375, 4555, 4354, 4248, 4429, 4521, 4465,
4561, 4401, 4531, 4534, 4488, 4645, 4767, 4492, 4507, 4386, 4288, 4354, 4476,
4417, 4579, 4540, 4533, 4555, 4698, 4752, 4663, 4546, 4602, 4632, 4827, 4735,
4854, 4764, 4728, 4785, 4657, 4642, 4707, 4720, 4738, 4585, 4329, 4444, 4524,
4651, 4509, 4446, 4644, 4660, 4501, 4488, 4504, 4533, 4504, 4386, 4545, 4459,
4441, 4531, 4548, 4645, 4729, 4659, 4699, 4612, 4567, 4672, 4542, 4557, 4585,
4600, 4564, 4600, 4617, 4707, 4603, 4509, 4516, 4699, 4831, 4950, 4806, 4672,
4641, 4633, 4563, 4492, 4633, 4458, 4428, 4488, 4627, 4491, 4366, 4339, 4500,
4486, 4497, 4500, 4444, 4525, 4596, 4480, 4497, 4588, 4543, 4557, 4561, 4470,
4554, 4584, 4642, 4638, 4773, 4785, 4747, 4837, 4944, 5073, 4966, 4903, 4896,
4851, 4795, 4894, 5013, 5194, 4960, 4423, 4437, 4401, 4563, 4617, 4822, 4606,
4386, 4494, 4426, 4483, 4596, 4677, 4642, 4693, 4587, 4506, 4480, 4498, 4569,
4465, 4480, 4506, 4500, 4476, 4638, 4677, 4504, 4608, 4504, 4591, 4632, 4659,
4662, 4636, 4713, 4666, 4776, 4743, 4689, 4693, 4737, 4782, 4485, 4606, 4809,
4660, 4485, 4648, 4573, 4438, 4599, 4504, 4507, 4467, 4533, 4669, 4696, 4563,
4471, 4551, 4563, 4636, 4729)
ModelPixelScaleTag
(0.00083333333, 0.00083333333, 0.0)
ModelTiepointTag
(0.0, 0.0, 0.0, -90.0, 42.0, 0.0)
GeoKeyDirectoryTag
(1, 1, 0, 7, 1024, 0, 1, 2, 1025, 0, 1, 1, 2048, 0, 1, 4326, 2049, 34737, 7, 0,
2054, 0, 1, 9102, 2057, 34736, 1, 1, 2059, 34736, 1, 0)
GeoDoubleParamsTag
(298.257223563, 6378137.0)
GEOTIFF_METADATA
{'GTModelTypeGeoKey': <ModelType.Geographic: 2>,
'GTRasterTypeGeoKey': <RasterPixel.IsArea: 1>,
'GeogAngularUnitsGeoKey': <Angular.Degree: 9102>,
'GeogCitationGeoKey': 'WGS 84',
'GeogInvFlatteningGeoKey': 298.257223563,
'GeogSemiMajorAxisGeoKey': 6378137.0,
'GeographicTypeGeoKey': <GCS.WGS_84: 4326>,
'KeyDirectoryVersion': 1,
'KeyRevision': 1,
'KeyRevisionMinor': 0,
'ModelPixelScale': [0.00083333333, 0.00083333333, 0.0],
'ModelTiepoint': [0.0, 0.0, 0.0, -90.0, 42.0, 0.0]}
I get a structure like
{'.zattrs': '{}',
'.zarray': '{\n "chunks": [\n 6,\n 1200\n ],\n "compressor": {\n "id": "imagecodecs_lzw"\n },\n "dtype": "|u1",\n "fill_value": 0,\n "filters": null,\n "order": "C",\n "shape": [\n 1200,\n 1200\n ],\n "zarr_format": 2\n}',
'0.0': ['N42W090_fall_vv_COH12.tif/N42W090_fall_vv_COH12.tif', 1990, 4477],
'1.0': ['N42W090_fall_vv_COH12.tif/N42W090_fall_vv_COH12.tif', 6467, 4528],
'2.0': ['N42W090_fall_vv_COH12.tif/N42W090_fall_vv_COH12.tif', 10995, 4561],
...
200 small chunks. I see that the filename likely encodes the time and variable name, but where are the location coordinates?
ModelPixelScaleTag (0.00083333333, 0.00083333333, 0.0)
ModelTiepointTag (0.0, 0.0, 0.0, -90.0, 42.0, 0.0)
The ModelPixelScaleTag has the pixel spacing in x and y 0.00083333333 The ModelTiepointTag has the upper left corner xcoordinates: -90 (=West 90deg) and y coordinates 42 (=North 42deg)
With TiffTag 256 ImageWidth @10 SHORT @18 1200 TiffTag 257 ImageLength @22 SHORT @30 1200
We know the image size and can calculate the bounding box.
e.g. lower right is -90+1200 x 00083333333 = -89 42 - 1200 x 00083333333 = 41
But we can also easliy derive the bounding box from the file name N42W090 refers to the upper left coordinate of the tile. And each tile is exactly 1x1 degree.
OK, so that's one thing. For this particular dataset, the files being so small and being made up of so many even much smaller chunks, we may decide that addressing each of the sub-chunks is going to be prohibitively slow.
I have not yet written a numcodecs codec which does "use function x to load this whole file", but it would be trivial to do, and this case might require it.
Something like
class WholeFileDecoder(Codec):
def __init__(self, function_loc, kwargs):
self.func = import_name(function_loc)
self.kwargs = kwargs
def decode(self, buf, _):
return self.func(io.BytesIO(buf), **self.kwargs)
where for TIFF it you want to encode TiffFile().asarray(). The grib2 code does something like this.
Interesting. Let me know if anything could be added to tifffile to make this easier.
IIUC, it might be more efficient not to index the strips/tiles in each TIFF file. Instead, index each file as a single chunk and use a TIFF codec (e.g. imagecodecs_tiff
based on libtiff).
Tifffile can parse a sequence of file names to higher dimensions using a regular expression pattern and export a fsspec reference (see the test at https://github.com/cgohlke/tifffile/blob/1c8e75bf29d591058311aee7856fc2c73dea5a83/tests/test_tifffile.py#L12744-L12779). This works with any kind of format supported by imagecodecs.
It's currently not possible to parse categories (e.g. "north"/"south", "summer"/"winter"), only numbers/indices. Also, the current implementation reads one file/chunk to determine the chunk shape and dtype but that could be changed.
@martindurant I think you are on the right track that we treat the entire file as a chunk. When we did the processing I deliberately did not choose any blocksize tiling a la COG as the data were so small to begin with, but I guess the gdal defaults made these very small sub-chunks which I have to admit I was not aware of. @cgohlke thanks for chiming in with clarifications and offer to possibly adapt tifffile! Parsing categories would be nice indeed. There are a plethora of geospatial data sets in particular that use this tile naming scheme with North/South and East/West designation of location boundaries.
@cgohlke , I see we have been thinking along the same lines. Since you have a codec that already handles whole tiff files, it would certainly make sense to just use that.
As far as combining is concerned - finding the key for each input file - there are a number of obvious things that we should be doing. We need to know the complete set of coordinates along the concat dimension(s), possibly done in a first pass, and for each input chunk:
Might the gdal generated VRT file be useful here after all? Here is an example of a VRT of one of the 88 data variables: https://sentinel-1-global-coherence-earthbigdata.s3.us-west-2.amazonaws.com/data/tiles/Global_fall_vv_COH12.vrt
If you take a look at the top of the VRT XML, it contains info on the georeferencing, origin
(upper left) and pixel spacing in the
<VRTDataset rasterXSize="432000" rasterYSize="166800">
<SRS dataAxisToSRSAxisMapping="2,1">GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]</SRS>
<GeoTransform> -1.8000000000000000e+02, 8.3333333000010982e-04, 0.0000000000000000e+00, 8.1000000000000000e+01, 0.0000000000000000e+00, -8.3333333000010982e-04</GeoTransform>
<VRTRasterBand dataType="Byte" band="1">
<NoDataValue>0</NoDataValue>
<ColorInterp>Gray</ColorInterp>
<ComplexSource>
<SourceFilename relativeToVRT="1">N00E005/N00E005_fall_vv_COH12.tif</SourceFilename>
<SourceBand>1</SourceBand>
<SourceProperties RasterXSize="1200" RasterYSize="1200" DataType="Byte" BlockXSize="1200" BlockYSize="6" />
<SrcRect xOff="0" yOff="0" xSize="1200" ySize="1200" />
<DstRect xOff="222000" yOff="97200" xSize="1200" ySize="1200" />
<NODATA>0</NODATA>
</ComplexSource>
If we already have the coordinates information from some other source, that would, of course, be totally reasonable too. I didn't gather whether the VRT thing is always available, or if it also needs construction at some effort.
gdalbuildvrt
would have had to been called to generate the layers.
I guess it's a question if we want the gdal dependency on generating the zarr store in this. gdalbuildvrt
of the gdal suite just might do the work. Maybe the code they use for gdalbuildvrt has the pieces ready that you could use directly:
https://github.com/TUW-GEO/OGRSpatialRef3D/blob/master/gdal-1.10.0/apps/gdalbuildvrt.cpp
https://gdal.org/programs/gdalbuildvrt.html
It would be OK to have the dependency for the same of making the references - we don't need it to load the data later. However, if there's a simpler way... I intuit that getting the arguments to gdalbuildvrt
right requires some expertise, compared to writing a little lambda function to extract numbers from the filename.
yes, that makes sense to me when all the info can be gathered easily from the filenames. gdalbuildvrt scans each geotiff for the tifftags containing the georeferencing info and as such is more generic. tifffile
seems to also offer the more generic solution. What gdalbuildvrt needs is actually just the list of geotiffs that should either be mosaicked into one layer or stacked (with the -separate switch.
One info we don't have from the filenames is the resolution needed for the coordinate dimensions I guess. But those could also easily be retrieved from the corresponding tifftag in one of the GeoTIFFs, maybe via tifffile
.
Is it possible to get a list of all file names under https://sentinel-1-global-coherence-earthbigdata.s3.us-west-2.amazonaws.com/data/tiles/ ? I think tifffile might be able to generate fsspec reference files from the list of names.
This being on s3, you can do
s3 = fsspec.filesystem("s3", anon=True)
allfiles = s3.ls("sentinel-1-global-coherence-earthbigdata/data/tiles")
(and you could get the equivalent HTTP URL for each of these 25528 files)
Thank you! Searching recursively finds 1,033,422 TIFF files. I'll give it a try.
That's the right amount of geotiffs. There should be 88 metrics, organized as explained in http://sentinel-1-global-coherence-earthbigdata.s3-website-us-west-2.amazonaws.com/
Are metrics like xarray variables?
yes. Except for the lsmap and inc metrics the temporal component is four seasons, i.e. northern hemisphere winter, spring, summer and fall (in that order as data were processing from 1-12-2019 to 30-11-2020.
So those would be one variable each, but with an extra dimension. Quite complex!
Oh yes, complex indeed!
Also,for extra complexity is there way we can encode the data conversions into the metadata? (as in the Data Types and Scaling Conventions section in the document link I sent).
can encode the data conversions into the metadata?
To some extent: you could apply numcodecs Categorize for the enum-like fields, and FixedScaleOffset for linear transforms. Anything more complex, and you would have to write your own codec implementing the calculation, or expect the user to take the dataset and perform any manipulations needed after opening.
I think the latter approach, giving the user a recipe to convert the data is probably simplest for now. That's how I have done it in the notebooks thus far.
Got something working with the imagecodecs tiff decoder, e.g. to view the 444,797 COHxx files in napari via fsspec and zarr:
import imagecodecs.numcodecs
import zarr
import fsspec
import napari
imagecodecs.numcodecs.register_codecs()
name = 'sentinel-1-global-coherence-earthbigdata.COH.json'
mapper = fsspec.get_mapper(
'reference://',
fo=f'https://www.lfd.uci.edu/~gohlke/{name}',
target_protocol='http',
)
za = zarr.open(mapper, mode='r')
viewer = napari.view_image(za, rgb=False, contrast_limits=[0, 255], colormap='gist_earth')
viewer.layers[0].name = name
viewer.dims.current_step = (112, 62, 1, 0, 0, 0, 0)
viewer.dims.axis_labels = (
'latitude',
'longitude',
'season',
'polarization',
'coherence',
'y',
'x',
)
napari.run()
Will try mosaicing/tiling next...
If you can figure out how to concatenate on multiple dimensions, then I'd love to see how! I have been struggling with how to get this right!
Here's a tiled view of SoCal. I don't think napari can handle the whole dataset but one can slice the zarr/dask array:
import imagecodecs.numcodecs
import dask.array
import zarr
import fsspec
import napari
imagecodecs.numcodecs.register_codecs()
name = 'Global_COH.json'
mapper = fsspec.get_mapper(
'reference://',
fo=f'https://www.lfd.uci.edu/~gohlke/{name}',
target_protocol='http',
)
data = zarr.open(mapper, mode='r')
data = dask.array.from_zarr(data)
roi = data[:, :, :, 46 * 1200 : 50 * 1200, 58 * 1200 : 66 * 1200]
viewer = napari.view_image(
roi,
rgb=False,
contrast_limits=[0, 255],
colormap='gist_earth',
interpolation='nearest',
name='SoCal',
)
viewer.dims.axis_labels = (
'season',
'polarization',
'coherence',
'latitude',
'longitude',
)
viewer.dims.current_step = (0, 0, 1, 0, 0)
napari.run()
Oh! It's a single 3.6TB 5-dimensional array! To make this readable by xarray, we'd have to add coordinate arrays too. I believe napari should be able o load data lazily, so if you start off with a field of view that's zoomed in to just a couple of tiles, it ought to work for the whole dataset. Of else, you can use the geo/holoviews stack (but they expect xarray, not raw zarr).
The zarr keys are derived purely from the filenames?
To make this readable by xarray, we'd have to add coordinate arrays too
I added _ARRAY_DIMENSIONS
(not tested).
I believe napari should be able o load data lazily, so if you start off with a field of view that's zoomed in to just a couple of tiles
Setting viewer.camera.zoom
to display one chunk, NAPARI_OCTREE
and multiscale=False
did not work for me. Napari loaded chunks until the system ran out of memory.
The zarr keys are derived purely from the filenames?
File names and chunk shape.
I can add coordinates to the references, so that they look good using xarray - and you can select regions using the coordinate labels.
I gather that the longitude of the one segment could be described as
lon(x) -> -90 + x * 00083333333
and similar for lat - but what would be an expression covering the whole dataset for lon/lat?
These are the regex patterns and categories used to parse the filenames:
LATITUDE_LABEL = 'latitude'
LATITUDE_PATTERN = rf'(?P<{LATITUDE_LABEL}>[NS]\d+)'
LATITUDE_CATEGORY = {}
i = 0
for j in range(82, -1, -1):
LATITUDE_CATEGORY[f'N{j:-02}'] = i
i += 1
for j in range(1, 79):
LATITUDE_CATEGORY[f'S{j:-02}'] = i
i += 1
LONGITUDE_LABEL = 'longitude'
LONGITUDE_PATTERN = rf'(?P<{LONGITUDE_LABEL}>[EW]\d+)'
LONGITUDE_CATEGORY = {}
i = 0
for j in range(180, 0, -1):
LONGITUDE_CATEGORY[f'W{j:-03}'] = i
i += 1
for j in range(180):
LONGITUDE_CATEGORY[f'E{j:-03}'] = i
i += 1
SEASON_LABEL = 'season'
SEASON_CATEGORY = {'winter': 0, 'spring': 1, 'summer': 2, 'fall': 3}
SEASON_PATTERN = rf'(?P<{SEASON_LABEL}>{"|".join(SEASON_CATEGORY)})'
POLARIZATION_LABEL = 'polarization'
POLARIZATION_CATEGORY = {'vv': 0, 'hh': 1, 'vh': 2, 'hv': 3}
POLARIZATION_PATTERN = (
rf'(?P<{POLARIZATION_LABEL}>{"|".join(POLARIZATION_CATEGORY)})'
)
COHERENCE_LABEL = 'coherence'
COHERENCE_CATEGORY = {
'COH06': 0,
'COH12': 1,
'COH18': 2,
'COH24': 3,
'COH36': 4,
'COH48': 5,
}
COHERENCE_PATTERN = rf'(?P<{COHERENCE_LABEL}>{"|".join(COHERENCE_CATEGORY)})'
PATTERN = (
LATITUDE_PATTERN
+ LONGITUDE_PATTERN
+ '_'
+ SEASON_PATTERN
+ '_'
+ POLARIZATION_PATTERN
+ '_'
+ COHERENCE_PATTERN
)
On lat/lon, is that specifically -82..+79 :
lat = np.concatenate([np.arange(1200) * 0.00083333333 + i for i in range(-82, 79)])
lon = np.concatenate([np.arange(1200) * 0.00083333333 + i for i in range(-180, 180)])
(using concat here to ensure hitting the integer degree grid)
The following little script makes your zarr references into a full xarray dataset, which we could view via geoviews. Before writing this back into a references file, I notice that there are four polarizations listed, but the array is only of length 2 in that dimension. Not sure what happened there!
import numpy as np
import numcodecs
import ujson
import xarray as xr
import zarr
u = ujson.load(open("Global_COH.json"))
out = {f"data/{k}": v for k, v in u.items()}
g = zarr.open_group(out)
lat = np.concatenate([np.arange(1200) * 0.00083333333 + i for i in range(-82, 79)])
lon = np.concatenate([np.arange(1200) * 0.00083333333 + i for i in range(-180, 180)])
arr = g.array("longitude", data=lon, dtype="float32")
arr.attrs["_ARRAY_DIMENSIONS"] = ["longitude"]
arr = g.array("latitude", data=lat, dtype="float32")
arr.attrs["_ARRAY_DIMENSIONS"] = ["latitude"]
seasons = ['winter', 'spring', 'summer', 'fall']
arr = g.array("season", data=seasons, dtype=object, object_codec=numcodecs.VLenUTF8(), compression=None)
arr.attrs["_ARRAY_DIMENSIONS"] = ["season"]
polarization = ['vv', 'hh']
arr = g.array("polarization", data=polarization, dtype=object, object_codec=numcodecs.VLenUTF8(), compression=None)
arr.attrs["_ARRAY_DIMENSIONS"] = ["polarization"]
coherence = [6, 12, 18, 24, 36, 48]
arr = g.array("coherence", data=coherence, dtype='uint8', compression=None)
arr.attrs["_ARRAY_DIMENSIONS"] = ["coherence"]
ds = xr.open_dataset(out, engine="zarr", backend_kwargs={"consolidated": False})
ds
produces
Dimensions: (coherence: 6, season: 4, polarization: 2, latitude: 193200, longitude: 432000)
Coordinates:
* coherence (coherence) float32 6.0 12.0 18.0 24.0 36.0 48.0
* latitude (latitude) float32 -82.0 -82.0 -82.0 -82.0 ... 79.0 79.0 79.0
* longitude (longitude) float32 -180.0 -180.0 -180.0 ... 180.0 180.0 180.0
* polarization (polarization) object 'vv' 'hh'
* season (season) object 'winter' 'spring' 'summer' 'fall'
Data variables:
data (season, polarization, coherence, latitude, longitude) float32 ...
I notice that there are four polarizations listed, but the array is only of length 2 in that dimension. Not sure what happened there!
'vv' and 'hh' are the only polarizations found in the coherence COHxx files.
Was chatting with @jkellndorfer and currently a lot of folks who work with geotiff use VRT to create virtual datasets (mosaics, time stacks, etc).
Could we do the same with fsspec-reference-maker, thereby allowing folks to read from large virtual collections and avoiding GDAL?
Seems that would be very cool.