Open Rub21 opened 9 months ago
Wow, yeah, that's a lot of big files we have lying around! So we're saving a full Planet every day?
A few questions:
That approach would be more challenging to implement, but it would be a bit more balanced between economy, backup safety, and keeping a history of the project.
Is the practice at OSM also to delete all backups older than a month?
OSM keeps some random dates: https://planet.openstreetmap.org/planet/2023/?C=S;O=D for 2023, and for 2024, seems they keep weekly files: https://planet.openstreetmap.org/planet/2024/ .
I like the idea of keeping the files as you mention, 👇 , we have data from since 2020-05-14, we can rename some files and keep.
Daily Planets for the last month Monthly Planets for the past year (say from the first day of each month) Annual Planets for the full history of the project (say from the first day of each year)
Are the backups truly redundant? Or would any historical metadata be lost if we delete these files? If there’s anything that could potentially have value in the future, I think it would be more prudent to apply for the AWS Open Data Sponsorship Program. Both OSM and OSMCha get free storage through this program; OHM would be a natural fit. Let me know if you need me to help with kicking off that process.
Notes on our backup:
This might be a both/and situation where we apply for the free AWS storage and we can't keep all backups forever
Do we know what OpenStreetMap does here? Are there archives for planet and history dumps going back to the beginning of time?
After examining the S3 costs, I discovered that the storage expense is 10 bucks per day, with the highest cost attributed to storage. Consequently, it appears that we are retaining some files that are no longer necessary. For example:
Imposm Expired Files: These files are no longer needed. Once Imposm pushes data to S3, the tiler cache retrieves those files. To clean the cache, it only removes the files that were uploaded a minute ago.
Planet Files: It is necessary to clean and remove files older than a month. Access the files here.
Planet Full History Files: Similar to Planet files, we need to clean and remove files older than a month. Access the files here.
DB (Web) Backups: We took care of cleaning up these files a month ago and implemented a script that removes files older than 1 month.
@batpad @danrademacher Confirm the decision to remove at least the planet replica and full planet files older than a month, as well as expired files because they are no longer required, lLater, we can replicate, as we did for DB backups, to remove older files from S3.