dacort / athena-federation-python-sdk

Unofficial Python SDK for Athena Federation
Apache License 2.0
16 stars 12 forks source link

SDK Design #2

Closed dacort closed 3 years ago

dacort commented 3 years ago

Currently the Python SDK requires the implementor to have deep knowledge of the request/response data types in the Athena Federation SDK. While similar to how the Java SDK is implemented, it doesn't seem very Pythonic.

I've got a few thoughts about a different way to implement this that I'll document here. These are some initial raw notes that I'll expand on in the coming days.


If our example is fairly static, we can initialize everything up front

g = AthenaExample(
    schema=[
        {
            "database": "schema1",
            "tables": ["table1"]
        },
        {
            "database": "schema2",
            "tables": ["table1", "table2"]
        }
    ]
)

So this takes care of:

Other data sources may be more dynamic. e.g. if this is pulling from another DB, we'll have to call getTables() on that DB.

So maybe, we can also make the interface less ... explicit.

g = AthenaExample()
def getDatabases() -> ['db1']
def getTables(dbname) -> ['table1']
def getSchema(dbname, tablename) -> pa.schema(
            [
                ("account_id", pa.string()),
                ("subject", pa.string()),
                ("from", pa.string()),
            ]
)

I need to validate that Athena will only request

The challenge is the schemas have to be (do they?) pyarrow schemas. Another small concern is just Athena lingo: