Research S3 auto-expire

I just went through the AWS docs on this recently for another project of mine.

Tell me more about how objects are being created and the buckets that they are being stored in?

With AWS S3, you can setup up expiration and transition rules on a bucket. The rules use object prefixes or tags to determine which objects should be acted upon. The rules are time based using the object creation time. (It gets a bit more involved if the objects are versioned).

If the boto code is creating buckets and putting objects into the buckets, it would probably be best to take a closer look at the AWS S3 python library to see if it supports S3 rule creation. If there is a small number of static buckets that are used day in and day out for the storage of web scrapings, it might be easier to use the AWS console to create the rules.

In the later case, the only possible change would be standardizing the object prefixes/tags that boto is using so the the rules don't have to be overly complicated.

One other thought, rules can also be used to transition S3 objects into low cost, slow access AWS Glacier storage. If there's any thought that someday the scraped data might be useful, it may be worthwhile to transition it to glacier before deleting from S3.

data-skeptic / home-data-gallery

Research S3 auto-expire #35