Open diranged opened 10 years ago
option 1 seems much better than option 2, because option 2 adds another piece of dependency and a place where things can break.
Not sure how option 1 would work, as there is no safe place to put the knife. If it needs to divide an object then to recover the object at the other end, all of the pieces must be received by the same process, and then what happens if one or more of the pieces are lost? or what if the process dies while processing them? May be more places where things can break in option 1. Option 2 would have to be optional, and a user can always choose to store big objects somewhere else and pass URLs manually.
Will you be accepting pull requests for it, option-2?
we will review the PR ofcourse. please come with proper unit and integrations tests :) and mention me
I don't know if this helps anyone who is thinking of adding this but there is a python implementation of the option 2 pattern using S3 for storage that is influenced by the AWS Java Extended client for SQS: https://github.com/archetype-digital/aws-sqs-extended. This extends boto3 with extra calls that could be used as a basis for a transport (I think). I've not reviewed the code in detail but it is tested under Python 3.7, 3.8 and 3.9, 99% test coverage and MIT licensed.
I'm a long time user of Celery+SQS but don't know my way around the internals, but I'd really be interested in a solution to this issue and would be happy to help out where I can.
I found this article very useful https://walid.dev/blog/saving-costs-asking-for-forgiveness-in-python/
I don't know if this helps anyone who is thinking of adding this but there is a python implementation of the option 2 pattern using S3 for storage that is influenced by the AWS Java Extended client for SQS: https://github.com/archetype-digital/aws-sqs-extended. This extends boto3 with extra calls that could be used as a basis for a transport (I think). I've not reviewed the code in detail but it is tested under Python 3.7, 3.8 and 3.9, 99% test coverage and MIT licensed.
I'm a long time user of Celery+SQS but don't know my way around the internals, but I'd really be interested in a solution to this issue and would be happy to help out where I can.
this could be useful some use cases
~AWS have now released the extended client library for Python, allowing up to 2GB messages on SNS via S3:~
based on new library can we close this or we need to integrate and ensure it is supported in kombu?
Ah, wait, sorry, the new lib is for addressing the same problem for SNS and not SQS, so not helpful here. Apologies.
AWS have now released the extended client library for Python, allowing up to 2GB messages on SQS via S3:
I've made a start on attempting to integrate the sqs_extended_client. You can view that here: https://github.com/celery/kombu/compare/main...Amwam:kombu:add-sqs-large-payload-support?expand=1
While this appears to work, I'm not sure if there are issues in the implementation. There are also features missing, such as automatically deleting the payload, after the task has been completed.
The core issue I've run into is that way kombu fetches messages from SQS is via the HTTP API, rather than using boto3, so the extended client isn't used when retrieving messages, only for publishing. Another PR references a desire to convert to using boto3 for calls, but this seems like a bigger refactoring is required to make this happen in a performant way. As a result, decoding the SQS message requires some manual handling to mimic how the sqs_extended_client behaves.
I've made a start on attempting to integrate the sqs_extended_client. You can view that here: https://github.com/celery/kombu/compare/main...Amwam:kombu:add-sqs-large-payload-support?expand=1
While this appears to work, I'm not sure if there are issues in the implementation. There are also features missing, such as automatically deleting the payload, after the task has been completed.
The core issue I've run into is that way kombu fetches messages from SQS is via the HTTP API, rather than using boto3, so the extended client isn't used when retrieving messages, only for publishing. Another PR references a desire to convert to using boto3 for calls, but this seems like a bigger refactoring is required to make this happen in a performant way. As a result, decoding the SQS message requires some manual handling to mimic how the sqs_extended_client behaves.
good job on starting work on it.
SQS only supports messages up to 256KB. Given that limitation, its very easy to hit the limit and fail your task submission. Here's a simple example:
There are two ways to fix this that I see.