bigdataviewer / bigdataviewer-core

ImgLib2-based viewer for registered SPIM stacks and more
BSD 2-Clause "Simplified" License
34 stars 35 forks source link

n5-aws-s3 #80

Open tischi opened 4 years ago

tischi commented 4 years ago

@tpietzsch @axtimwalde @igorpisarev

@constantinpape and myself are working on loading n5 from an AWS object store into bdv. We already have running code and once that is more mature I could also stage a PR into bigdataviewer-core. We however thought that it would be good to discuss a few things upfront.

Maybe a good starting point are the xml specifications.

What we currently have looks like this:

<ImageLoader format="bdv.n5.s3" version="1.0">
      <ServiceEndpoint>https://s3.embl.de</ServiceEndpoint>
      <SigningRegion>us-west-2</SigningRegion>
      <BucketName>a-bucket</BucketName>
      <Key>mri-stack.n5</Key>
</ImageLoader>

The corresponding java code looks like this:

@Override
public N5ImageLoader fromXml( final Element elem, final File basePath, final AbstractSequenceDescription< ?, ?, ? > sequenceDescription )
{
    final String version = elem.getAttributeValue( "version" );
    final String serviceEndpoint = XmlHelpers.getText( elem, "ServiceEndpoint" );
    final String signingRegion = XmlHelpers.getText( elem, "SigningRegion" );
    final String bucketName = XmlHelpers.getText( elem, "BucketName" );
    final String key = XmlHelpers.getText( elem, "Key" );

    final AwsClientBuilder.EndpointConfiguration endpoint = new AwsClientBuilder.EndpointConfiguration( serviceEndpoint, signingRegion );

    final AnonymousAWSCredentials anonymousAWSCredentials = new AnonymousAWSCredentials();

    final AmazonS3 s3 = AmazonS3ClientBuilder
            .standard()
            .withPathStyleAccessEnabled( true )
            .withEndpointConfiguration( endpoint )
            .withCredentials( new AWSStaticCredentialsProvider( anonymousAWSCredentials ) )
            .build();

    final N5AmazonS3Reader reader = getReader( s3, bucketName, key );
    return new N5ImageLoader( reader, sequenceDescription );
}

Please let us know if you have opinions or suggestions regarding this.

constantinpape commented 4 years ago

I think that we can combine the 3 fields ServiceEndpoint, BucketName and Key into a single field Uri, see @igorpisarev s comment. We did not do this in the prototype yet, because the AmazonS3URI does not expose the service endpoint, but one could just use the same parsing logic (or some other url parser) for this.

axtimwalde commented 4 years ago

Isn't the n5-utils viewer already doing this?

https://github.com/saalfeldlab/n5-utils/blob/master/src/main/java/org/janelia/saalfeldlab/View.java

Here is the factory

https://github.com/saalfeldlab/n5-utils/blob/master/src/main/java/org/janelia/saalfeldlab/N5Factory.java

igorpisarev commented 4 years ago

There is also https://github.com/saalfeldlab/n5-viewer, which is a BDV-based Fiji plugin for N5 with support for AWS S3 and Google Cloud.

Instead of specifying the endpoint and the region, it asks the user to set up their credentials via the official command line tool aws configure, and then they are automatically picked up by the AWS S3 SDK. This allows to read private buckets (either user's own buckets, or other people's buckets that the user has been granted access to), but has a downside that we can't access public buckets anonymously without an account, although it shouldn't be very difficult to implement this option as well.

n5-viewer's metadata format is also not very flexible and at this moment allows to read static multiscale multichannel datasets (arranged as c0/s0..sN, c1/s0..sN, and so on). There is an ongoing discussion about the new metadata format to make it more flexible and include support for multiple timepoints.

@tischi @constantinpape I think it would be great to support the cloud platforms in BDV by default and ship it with Fiji. However, there is certainly some overlap with the existing tools n5-viewer and n5-utils, so I would suggest to take a look at them as they may have some useful bits.

tischi commented 4 years ago

I am probably missing something, but we just thought it could be useful to implement an aws-s3 equivalent of this code: https://github.com/bigdataviewer/bigdataviewer-core/blob/b9e308b70afb1addf3ffb8c5199d2db45f55cde1/src/main/java/bdv/img/n5/XmlIoN5ImageLoader.java#L42

axtimwalde commented 4 years ago

Certainly! It wasn't clear to me that you are pondering about an XML schema (and not loading n5 from an AWS object store into bdv) from the issue text. I prefer the separation of semantic fields over lumping them together in a single URL.

tischi commented 4 years ago

I prefer the separation of semantic fields over lumping them together in a single URL.

Me, too.

constantinpape commented 4 years ago

There is also https://github.com/saalfeldlab/n5-viewer, which is a BDV-based Fiji plugin for N5 with support for AWS S3 and Google Cloud.

Thanks for the pointer!

Instead of specifying the endpoint and the region, it asks the user to set up their credentials via the official command line tool aws configure, and then they are automatically picked up by the AWS S3 SDK.

If we (also) have a bdv-xml scheme, it would make it much easier to distributed files to users who are not comfortable with a CLI and all relevant data would be bundled in the xml (at least for a public bucket).

This allows to read private buckets (either user's own buckets, or other people's buckets that the user has been granted access to)

I am not very familiar with the AWS authentication system yet, but as far as I understand it we could also access private buckets from the information in the metadata, given that the correct credentials are stored in ~/.aws/credentials (or somewhere else the S3 SDK looks for credentials).

@tischi @constantinpape I think it would be great to support the cloud platforms in BDV by default and ship it with Fiji. However, there is certainly some overlap with the existing tools n5-viewer and n5-utils, so I would suggest to take a look at them as they may have some useful bits.

Ok, we'll try to share our example application and the code for it soon, so we can see how to improve it and if we can pull in parts of the other n5 tools.

I prefer the separation of semantic fields over lumping them together in a single URL.

Me, too.

I don't have strong feelings about this, so separate fields would be fine with me as well.