casangi / xradio

Xarray Radio Astronomy Data IO
https://xradio.readthedocs.io/en/latest/
Other
9 stars 5 forks source link

enable parallel image writes to s3 #212

Closed amcnicho closed 1 month ago

amcnicho commented 1 month ago

Addressing https://github.com/casangi/astroviper/issues/18, this branch makes changes necessary to allow the output of cube image writes to be specified as a S3 URI in addition to local file system path.

amcnicho commented 1 month ago

I think some unit testing of the updated write_chunk function is probably in order. Also note that import xradio.image._util._zarr.zarr_low_level.write_chunk raises ModuleNotFoundError -- I'd argue that it's not the most intuitive practice to force the from xradio.image._util._zarr.zarr_low_level import write_chunk syntax on utilities like that.

I set this as a draft PR because, while the output of astroviper.imaging.cube_imaging_niter0 is properly written to S3 as of the state of changes on this branch (and the corresponding https://github.com/casangi/astroviper/pull/20), trying to read back the output returns a dataset with empty data variables which seems like it should be understood and fixed before this merges.

from xradio.image import load_image
load_image(image_name)
<xarray.Dataset> Size: 4MB
Dimensions:          (l: 500, m: 500, frequency: 8, polarization: 2, time: 1)
Coordinates:
    declination      (l, m) float64 2MB -0.3295 -0.3295 ... -0.3291 -0.3291
  * frequency        (frequency) float64 64B 3.439e+11 3.439e+11 ... 3.44e+11
  * l                (l) float64 4kB 0.0001576 0.0001569 ... -0.0001569
  * m                (m) float64 4kB -0.0001576 -0.0001569 ... 0.0001569
  * polarization     (polarization) <U2 16B 'XX' 'YY'
    right_ascension  (l, m) float64 2MB 3.15 3.15 3.15 3.15 ... 3.15 3.15 3.15
  * time             (time) float64 8B 0.0
    velocity         (frequency) float64 64B 3.916e+04 2.937e+04 ... -2.937e+04
Data variables:
    *empty*
<etc.>
amcnicho commented 1 month ago

I fixed an oversight in the writing of the .zmetadata files that was the cause of the failure of subsequent read_image (and indeed xarray.open_zarr/zarr.open ) calls to access the data variables from images written to S3.