broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

Test GATK4 with the existing S3 NIO plugin and get basic S3 read support working #3708

Open droazen opened 7 years ago

droazen commented 7 years ago

There is an existing NIO filesystem provider for Amazon S3 that has been used successfully with GATK4 by at least one user (with some minor tweaks to the engine). We should add the S3 plugin as a dependency, add basic tests for read support, and make whatever changes are needed to get it working.

psendil commented 6 years ago

Hi, Glad to know that you have tested GATK4 with Amazon S3 using NIO file system plugin. I have been stuck on this process for long...I would really appreciate if you could share the work around procedure detail for this. Thanks in advance ! Senthil

sooheelee commented 6 years ago

Any news on this @droazen? We have interested folks, e.g. https://gatkforums.broadinstitute.org/gatk/discussion/11600/gatk-4-support-on-aws.

droazen commented 6 years ago

@sooheelee We evaluated the S3 plugin, but found that it always localizes the entire file, which defeats the purpose of NIO. We are currently assessing how difficult it would be to patch the existing plugin. Issue is here: https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/issues/103

sooheelee commented 6 years ago

Thanks for the status @droazen.

magicDGS commented 6 years ago

@droazen - maybe for the patch of the upplication implementation it might be worthy to have a look to epam/htsjdk-s3-plugin S3SeekableStream. I will suggest that at the original issue.

JaeYJung commented 4 years ago

Is there any update for this issue? I'm asking as AWS moved to NIO v2 late 2018 and htslib supports direct S3 access.

lbergelson commented 4 years ago

We haven't done any work on it ourselves, but there are a few different solutions mentioned here:

https://github.com/Upplication/Amazon-S3-FileSystem-NIO2/issues/103

The laserson lab fork might work.

We've also recently added beta support for reading directly from https paths, so if you get an S3 signed url you should be able to read from it. (It's not totally robust yet but improvements are coming soon.)