Closed aleclerc-cio closed 3 months ago
Thanks for checking out our repo! :heart:
Is there any self-cleaning methods, similar to GitHub to clean up artifacts after a certain amount of time?
Scheduled cleanups definitely are on our todo list. If you're using minio you can configure retention policies to automatically delete data after a certain amount of time (https://min.io/docs/minio/linux/administration/object-management/object-retention.html#configure-bucket-default-object-retention)
On a somewhat related note, I'm also wondering how the sqllite database works for key caching. It looks like if the container restarts (and we lose the db) it will be lost and our cache will essentially be reset. It might be nice to have a default config that is persistent.
The sqlite database is also part of the /data
directory inside the docker container and should be persisted across restarts (if a volume is setup).
We might add other database options in the future.
Thanks @LouisHaftmann , I think the other DBs would be great, and allow to run in HA.
I'm a little concerned about it not being HA right now, and will need to do some testing to figure out how the pipelines behave when the service in unavailable. If it just fails the cache, that's fine, but if it fails the pipeline, that might be a bigger concern.
On the key with the clean-up policy. I did think of that, but wasn't sure how the application would handle if the 'ley' is in the DB, but the data itself doesn't exist on the storage layer.
I'm a little concerned about it not being HA right now, and will need to do some testing to figure out how the pipelines behave when the service in unavailable. If it just fails the cache, that's fine, but if it fails the pipeline, that might be a bigger concern.
The cache actions doesn't fail the whole workflow. If caching fails it just continues without cache.
On the key with the clean-up policy. I did think of that, but wasn't sure how the application would handle if the 'ley' is in the DB, but the data itself doesn't exist on the storage layer.
You are right, missing data is currently not being handled correctly. We will look into that!
Any database preferences? What DB should we focus on first?
mysql and/or postgres would be good! Anything with a decent k8s operator is pretty easy to setup.maintain. That only really buys HA though and if the cache gracefully fails I'm less concerned about it. I would definitely focus on the 'missing' data side first as that will allow for retention policies and the like virtually for free.
Also, while I'm brainstorming, a GCS storage driver would be awesome too ;)
For sqlite litestream can also help, You can run it on the container as a supervisor process that backup/restore your data from S3 (or a compatible storage)
We added automatic cleanup and support for different databases in v2.0.0
First off, I wanted to say that this project is great and something we are heavily considering using along with ARC to speed up our caches.
Is there any self-cleaning methods, similar to GitHub to clean up artifacts after a certain amount of time?
On a somewhat related note, I'm also wondering how the sqllite database works for key caching. It looks like if the container restarts (and we lose the db) it will be lost and our cache will essentially be reset. It might be nice to have a default config that is persistent.