Zarr support in NetCDF-Fortran for "cloud-native" model simulations?

Unidata / netcdf-fortran

Official GitHub repository for netCDF-Fortran libraries, which depend on the netCDF C library. Install the netCDF C library first.

Other

241 stars 97 forks source link

Zarr support in NetCDF-Fortran for "cloud-native" model simulations? #209

Open JiaweiZhuang opened 4 years ago

JiaweiZhuang commented 4 years ago

First thanks for all the great work on NetCDF!

I have a research project that will significantly benefit from NetCDF-Zarr. I recently saw a tweet from @jhamman that "pre-alpha will be available early in 2020". I also notice some Zarr-related updates like Unidata/netcdf-c#1259. I am excited to test the new Zarr capability with real models and give feedbacks. Is it possible to get a preliminary version to play with around Feb-March? Or is it still too early to say?

More details about the use case: My workflow involves running Fortran-based models in a cloud-native container environment, for example AWS Batch or Kubernetes cluster. The main benefit is to scale out ensemble runs quickly via AWS Batch Array Jobs or Kubernetes Parallel Jobs . Similar to what Pangeo does, but here for Fortran models instead of Dask workers. However I/O is a major pain in a container environment (need to deal with Persistent Volumes for example). It is actually possible to mount a Lustre to Kubernetes, but the workflow will be much, much simpler if the model can directly read/write with S3.

WardF commented 4 years ago

We are hoping to have a version out in the next month or two, so the Feb-March timeframe is perfectly reasonable!

JiaweiZhuang commented 4 years ago

Just to check -- is it possible to get a testing version this month?

DennisHeimbigner commented 4 years ago

In fortran no. In C maybe. But we still need an S3 driver. We are currently using local storage formats for testing.

DennisHeimbigner commented 4 years ago

I take that back. Once the C version is working, it should also work with any language that used the C library. If, that is, the language will no interfere with the use of URLs as path names for nc_open and nc_create.

rsignell-usgs commented 3 years ago

@DennisHeimbigner and @WardF, do you think it would be possible to write Zarr from FORTRAN using the new 4.8.0 NetCDF C library with this approach @ocefpaf pointed me toward: https://riptutorial.com/fortran/example/7149/calling-c-from-fortran

DennisHeimbigner commented 3 years ago

It should be possible assuming that the nf_open path can take a URL string. I think one of our interns tested this over the summer and I believe it worked.

rsignell-usgs commented 3 years ago

Cool! Which intern was it? It would be nice to find out what they discovered.

rsignell-usgs commented 3 years ago

@DennisHeimbigner pingity ping ping

edwardhartnett commented 3 years ago

I just built netcdf-c-4.8.0 with netcdf-fortran-4.5.3, also using MPI for parallelIO.

All tests passed.

I had to use: FCFLAGS='-fallow-argument-mismatch -g -Wall' FFLAGS='-fallow-argument-mismatch -g -Wall'

The fortran library just hands the path over to the C library, so Zarr stuff should work transparently to Fortran, just as DAP does.

rsignell-usgs commented 3 years ago

@edhartnett , you had to use "-g"? So not ready for prime time (e.g. "-O3" yet)?

What I'd like to do is write Zarr from our ocean modeling simulations that would look exactly like what xarray produces...

edhartnett commented 3 years ago

No, the -g and -Wall are not what I meant. I had to use -fallow-argument-mismatch.

Absolutely this is ready for prime-time. ;-)

On Wed, May 5, 2021 at 5:29 AM Rich Signell @.***> wrote:

@edhartnett https://github.com/edhartnett , you had to use "-g"? So not ready for prime time (e.g. "-O3" yet)?

What I'd like to do is write Zarr from our ocean modeling simulations that would look exactly like what xarray produces...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf-fortran/issues/209#issuecomment-832614538, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCSXXG5RE72VBYZGU3GEK3TMETX3ANCNFSM4KFS26AQ .

rsignell-usgs commented 3 years ago

@edhartnett, do you have a sample fortran program that creates a zarr dataset you could share?

edwardhartnett commented 3 years ago

No, sorry. I haven't tried Zarr.

rsignell-usgs commented 3 years ago

@edhartnett, Ah bummer. But it should now be possible for me to do that, right?
Ooh, maybe I could use "ncgen -f" to get a sample code.

DennisHeimbigner commented 3 years ago

Take any simple Fortran program that creates a simple netcdf4 dataset. Suppose it creates a file called "simple.nc". Replace the call of nf_create("simple.nc",NF_NETCDF4,ncid) with nf_create("file://simple.zarr#mode=zarr,file",NF_NETCDF4,ncid) That should create directory called simple.zarr that is in pure zarr format. You can replace the mode=zarr,file with mode=nczarr,file if you want to create with NCZarr format.

rsignell-usgs commented 3 years ago

@DennisHeimbigner, okay, I'll try that! And mode=nczarr,xarray,file if we want to create xarray-compatible zarr, right?

DennisHeimbigner commented 3 years ago

Depends. If you use the github master, then yes, mode=xarray,file should produce pure zarr with the xarray convention. If you use 4.8.0, then it does not xarray support. Please let me know if you have problems.