Open jimmyhmiller opened 4 years ago
Docs for this snowflake API: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTSelectObjectAppendix.html
Yep, it's a weird one. I ended up just using the Java SDK (v1.12.132) via interop like so:
(let
[client (-> (com.amazonaws.services.s3.AmazonS3ClientBuilder/standard)
(.withCredentials (new com.amazonaws.auth.profile.ProfileCredentialsProvider "my-profile"))
.build)
bucket-name "bucket2"
object-key "inventory/bucket1/DailyInventory/data/e56b826c-f557-445a-8389-645dcf95d2d2.csv.gz"
query "SELECT s._1, s._2 FROM S3Object s limit 25"
output-file-path "output.csv"
input-serialization (-> (new com.amazonaws.services.s3.model.InputSerialization)
(.withCsv (new com.amazonaws.services.s3.model.CSVInput))
(.withCompressionType
(com.amazonaws.services.s3.model.CompressionType/GZIP)))
output-serialization (-> (new com.amazonaws.services.s3.model.OutputSerialization)
(.withCsv (new com.amazonaws.services.s3.model.CSVOutput)))
req (-> (new com.amazonaws.services.s3.model.SelectObjectContentRequest)
(.withBucketName bucket-name)
(.withKey object-key)
(.withExpression query)
(.withExpressionType com.amazonaws.services.s3.model.ExpressionType/SQL)
(.withInputSerialization input-serialization)
(.withOutputSerialization output-serialization))
res (.selectObjectContent client req)]
(with-open
[out (clojure.java.io/output-stream output-file-path)]
(-> res .getPayload .getRecordsInputStream (clojure.java.io/copy out))))
Full class names for easy copy and pasting.
Dependencies
Description with failing test case
When trying to run s3 SelectObjectContent the response body is not just xml and so parsing the body fails. you can see this by running:
As was discussed in slack, this error occurs because aws returns a custom format and can actually return different formats based on the options you based. Ideally I'd love to be able to call this and get a stream that just contains the records I am looking for. But things like progress could make this potentially harder to deal with.
Stack traces