Open droazen opened 7 years ago
Hi, Glad to know that you have tested GATK4 with Amazon S3 using NIO file system plugin. I have been stuck on this process for long...I would really appreciate if you could share the work around procedure detail for this. Thanks in advance ! Senthil
Any news on this @droazen? We have interested folks, e.g. https://gatkforums.broadinstitute.org/gatk/discussion/11600/gatk-4-support-on-aws.
@sooheelee We evaluated the S3 plugin, but found that it always localizes the entire file, which defeats the purpose of NIO. We are currently assessing how difficult it would be to patch the existing plugin. Issue is here: https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/issues/103
Thanks for the status @droazen.
@droazen - maybe for the patch of the upplication implementation it might be worthy to have a look to epam/htsjdk-s3-plugin S3SeekableStream
. I will suggest that at the original issue.
Is there any update for this issue? I'm asking as AWS moved to NIO v2 late 2018 and htslib supports direct S3 access.
We haven't done any work on it ourselves, but there are a few different solutions mentioned here:
https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/issues/103
The laserson lab fork might work.
We've also recently added beta support for reading directly from https paths, so if you get an S3 signed url you should be able to read from it. (It's not totally robust yet but improvements are coming soon.)
There is an existing NIO filesystem provider for Amazon S3 that has been used successfully with GATK4 by at least one user (with some minor tweaks to the engine). We should add the S3 plugin as a dependency, add basic tests for read support, and make whatever changes are needed to get it working.