Closed phongthienvu closed 4 years ago
Yeah it will query all rows and store it in S3. What you can do is instead of having athena-express retrieve all results, just setup skipResults: true
in configuration. This will return the S3 location that you can access directly using SDK/CLI and paginate the results as you wish.
Hi ghdna,
I thought my question is not clear enough. Anyway, I am still learning this Athena, so I may be wrong, please correct me if any.
As I read through the code, I saw you get the query result from saved S3 files(.csv). But from AWS example (https://docs.aws.amazon.com/code-samples/latest/catalog/javascript-athena-index.js.html), they are using getQueryResults
to get the result. And in this method, as AWS SDK doc(https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Athena.html#getQueryResults-property), it returns a NextToken field (if the result is larger than a limit which I still don't know exactly how many). Therefore, if the code only bases on S3 files to get query results, I think it may cause missing data.
So the AWS documentation you shared uses getQueryResults
method which restricts to 1000 records from S3 and provides a next token for every 1000 records - which can be time consuming and require multiple calls to S3.
Therefore in athena-express, I'm not using this method. Instead athena-express reads a stream of data from S3 so you can fetch all the records (even >1000) if you'd like. However, if you don't want all the records, then you can skip this step by configuring skipResults: true
and simply have athena-express store the results in S3 for you that you can retrieve later using your own logic in whatever method you see fit.
In a case of large returned data (100k rows), as I understand, the data may be truncated in the first call (1000 rows) + a NextToken field. I just wonder if the 'athena-express' handles this case? If not, any workaround for this case?