This problem was discovered as a doubt originally for attaining NRT streaming in Amazon SQS using a custom .py script, which uses PyAirbyte to call the read() in a loop.
Amazon SQS has the option to delete messages on read, and it's a part of the configuration. It is clearly added in my configuration for source-amazon-sqs.
But, calling read() again and again, somehow returns back the last read data, even if it supposed to have been deleted.
Description
The supposed source code was read, and it was decided to use
result = source.read(cache=None,write_strategy="replace",force_full_refresh=True)
This piece of code indicates that we're using the default cache (DuckDB), and with a write strategy to the cache, as "replace", and also using a full refresh to drain out any previously read records further.
But, unlike the expected behavior, this DID NOT rewrite the cached dataset. it instead returned the last stored cache replace.
ADDITIONAL CONTEXT: There were 0 records processed at the start of the script, and this was only identified when the script was ran again and again manually, especially from the start.
Workarounds
Deleting cache everytime the custom script runs, before read() (The most viable option to ensure preserving the functionality of read() )
Using get_records to skip the concept of caching itself. - (This means there will be complications when multiple streams are to be selected)
I'm willing to contribute further to this issue, and you could assign me for doing any changes.
Context
https://airbytehq.slack.com/archives/C06FZ238P8W/p1710320283398979
This problem was discovered as a doubt originally for attaining NRT streaming in Amazon SQS using a custom
.py
script, which uses PyAirbyte to call theread()
in a loop.source-amazon-sqs
.read()
again and again, somehow returns back the last read data, even if it supposed to have been deleted.Description
result = source.read(cache=None,write_strategy="replace",force_full_refresh=True)
Workarounds
read()
(The most viable option to ensure preserving the functionality of read() )get_records
to skip the concept of caching itself. - (This means there will be complications when multiple streams are to be selected)I'm willing to contribute further to this issue, and you could assign me for doing any changes.