glencoesoftware / bioformats2raw

Bio-Formats image file format to raw format converter
GNU General Public License v2.0
77 stars 35 forks source link

Support for Google Cloud Storage #176

Open perlman opened 1 year ago

perlman commented 1 year ago

This is a small experiment for writing output directly to GCS.

This is done by including google-cloud-nio. outputOptions needs to be non-null for newFileSystem().

While functional, this is still incomplete:

(As a side note, we tried using Google's S3 interface. It fails on a permissions check in JZarr before writing data.)

melissalinkert commented 1 year ago

@perlman: did you have more work planned here?

perlman commented 1 year ago

I've been using this unmodified for a while now. I'll bring it up-to-date with main and see where we're at.

melissalinkert commented 1 year ago

Thanks for the update, @perlman. I'm fine with taking this out of draft status, but adding a usage example to the README would be helpful for testing.

melissalinkert commented 1 year ago

@perlman, is there a simple example of how to use this feature?

perlman commented 1 year ago

Whoops, I let this slip. I'll get to this today or tomorrow! (or Monday, sorry about that.)

perlman commented 1 year ago

@melissalinkert I'm wondering where the right place to put an example. I had started to modify the --help text, but it seems that it may be a bit too verbose?

The usage is very straight forward, e.g.:

bioformats2raw-0.7.1-SNAPSHOT/bin/bioformats2raw --tile_width 2048 A_2202_20_ApoB.ndpi gs://jax-zarr-playpen/data/A_2202_20_ApoB.zarr

That's it. The access credentials will come from the environment, e.g, gcloud auth login or inherited from a service account. (Application Default Credentials )

The credentials must allow for read/write on the bucket. (Minimally, this can be Storage Object Creator, Storage Object Viewer and Storage Object Delete).

--output-options does not currently work. Google NIO does not seem happy with the Map<string, string>, with an exception related to the type. I've punted temporarily on digging into this, as it would probably require some special case type conversion of the values.

melissalinkert commented 11 months ago

Sorry for dropping this - really was just thinking a few lines in the README.md with exactly what you've already noted in https://github.com/glencoesoftware/bioformats2raw/pull/176#issuecomment-1603427528 is sufficient documentation.

melissalinkert commented 11 months ago

@perlman: that's great, thanks. Do you want to take this out of draft so we can consider for 0.8.0? Or did you have more work planned before this is ready for review?

perlman commented 11 months ago

I think this meets MVP! I've been using it to convert a bunch of NDPI files to Zarr.

At minimum, I think we should add an example of using s3 to the README (& the suggested flags used for Cloudian deployments).

"Nice to have" would be working flags for GCS (which require correct value types) and a test using com.google.cloud.storage.contrib.nio.testing, which would show functional NIO2 integration.