dazza-codes / aio-aws

Asyncio utils for AWS Services
Apache License 2.0
3 stars 1 forks source link

lambda binary responses #19

Open dazza-codes opened 3 years ago

dazza-codes commented 3 years ago

Notes or references on lambda binary responses

https://docs.aws.amazon.com/apigateway/latest/developerguide/lambda-proxy-binary-media.html

use the Content-Type in the response to manage parsing the data correctly

binary stream data via async aiohttp might parse the data differently already?

def lambda_handler(event, context):
    number = random.randint(0,1)
    if number == 1:
        response = s3.get_object(
            Bucket='bucket-name',
            Key='image.png',
        )
        image = response['Body'].read()
        return {
            'headers': { "Content-Type": "image/png" },
            'statusCode': 200,
            'body': base64.b64encode(image).decode('utf-8'),
            'isBase64Encoded': True
        }
    else:
        return {
            'headers': { "Content-type": "text/html" },
            'statusCode': 200,
            'body': "<h1>This is text</h1>",
        }

request might need to ask for binary media type Accept: application/octet-stream

https://pypi.org/project/pbjson/

https://github.com/mapbox/geobuf

https://www.compose.com/articles/faster-operations-with-the-jsonb-data-type-in-postgresql/

And this has some immediate benefits:

more efficiency, significantly faster to process, supports indexing (which can be a significant advantage, as we'll see later), simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X!) And some drawbacks:

slightly slower input (due to added conversion overhead), it may take more disk space than plain json due to a larger table footprint, though not always, certain queries (especially aggregate ones) may be slower due to the lack of statistics.

The reason behind this last issue is that, for any given column, PostgreSQL saves descriptive statistics such as the number of distinct and most common values, the fraction of NULL entries, and --for ordered types-- a histogram of the data distribution. All of this will be unavailable when the info is entered as JSON fields, and you will suffer a heavy performance penalty especially when aggregating data (COUNT, AVG, SUM, etc) among tons of JSON fields.

To avoid this, you may consider storing data that you may aggregate later on regular fields.

https://www.bizety.com/2018/11/12/protocol-buffers-vs-json/