But we need long-term solution for such situations.
It is risky to put some data on public dataset, as it could be used for "payment attack" when data is requested and AWS bills grow infinitely.
Possible solutions we discussed with @redcatbear and @tkilias:
as AWS s3 doesn't support rate limiting out of the box, we can consider switching to another cloud provider supported by cloud-storage extension
api gateway with rate limiter could potentially be used (if it can mimic s3 protocol)
s3 requestor pays option (might require support in cloud-storage provider)
monitor aws bills and disable access in case of suspicious activity
Around beginning of March, Y8M dataset we used for cloud storage extension notebook was closed. Created a ticket for dataset provider: https://github.com/aws-samples/data-lake-as-code/issues/28 And ticket for replacing the dataset: https://github.com/exasol/ai-lab/issues/247
But we need long-term solution for such situations. It is risky to put some data on public dataset, as it could be used for "payment attack" when data is requested and AWS bills grow infinitely.
Possible solutions we discussed with @redcatbear and @tkilias:
Those options require deeper investigation.