awslabs / aws-athena-query-federation

The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own data sources and code.
Apache License 2.0
560 stars 295 forks source link

[FEATURE] Workaround for spilling to S3 #87

Open buremba opened 4 years ago

buremba commented 4 years ago

We're investigating the ways to integrate the Raptor connector (https://github.com/prestodb/presto/tree/master/presto-raptor) to Athena. We store the shards as ORC files in S3 and use Mysql as a metadata backend that stores the BRIN indexes separately.

We would like to write a connector for Athena that applies predicate & aggregate pushdown to the query and returns the ORC files (shards) from the S3 but the 6mb limit makes this connector useless.

I see that you're considering an alternative to spilling feature so I wanted to create an issue in order to be able to keep track of the status.

avirtuos commented 4 years ago

Which 6MB limit? The only API that isn't paginated or capable of spill is getTableLayouts() for connector that need to return more than 6MB of metadata we recommend structuring the connector to move the large metadata to splits. This allows Athena engine to avoid blocking on long running metadata operations. Let me know if this clarifies things or please provide me a bit more information about the 6MB limit you are referring to.

buremba commented 4 years ago

@avirtuos I'm referring to RecordHandler interface which is responsible from returning the data for TableScan operator, not the metadata. (See the example: https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-example/src/main/java/com/amazonaws/connectors/athena/example/ExampleRecordHandler.java#L110)

The 6MB limit is mentioned here: https://github.com/awslabs/aws-athena-query-federation/wiki/FAQ

avirtuos commented 4 years ago

Ah ok. I understand now. Is there a reason why the spill capability won't work for you? Or are submitting this for the optimization but not as a functional blocker?

buremba commented 4 years ago

The Raptor connector is an alternative connector for Hive and we store the terabytes of ORC files in S3. Raptor is actually optimized for the workload that involves hot storage for fast access (SSD) + cold storage for redundancy (S3).

We will be dropping the support for hot storage in favor of Athena's caching mechanism but writing the files again to S3 as a middleware layer would completely make this kind of solution useless because of performance reasons except for connectors which deal with TBs of data. Basically IO will be the huge bottleneck here.

avirtuos commented 4 years ago

Makes sense. We are indeed investigating alternatives to the current S3 spill as well as prioritization of that work based on demand for it.