MobilityData / mobility-feed-api

Apache License 2.0
8 stars 3 forks source link

Investigate data storage size, cost and storage alternatives from TransitFeeds #195

Open emmambd opened 7 months ago

emmambd commented 7 months ago

Describe the problem

As part of adding the historical data on TransitFeeds to the Mobility Database , we need to understand

We have just under 10Tb of files from TransitFeeds.

Proposed solution

We need to evaluate several different storage options (AWS, GCP, etc) and assess

Alternatives you've considered

No response

Additional context

No response

jcpitre commented 7 months ago

Here is a summary (and raw data) of what is on Transitfeeds as of 2013-12-08: https://docs.google.com/spreadsheets/d/15UmhrhS-E0sxpzU9w3bQo45PkyBzw7zAJjz7Ug4RZY4

jcpitre commented 6 months ago

See this for report.

emmambd commented 6 months ago

@jcpitre Thanks so much for your super detailed analysis! This is great work and very helpful. Based on your Conclusion and a review of the options, it sounds like

I propose we revisit this analysis a month after the Mobility Database API v1 is launched (so I'll book a holding time during quarterly planning in March) and we can talk about how to manage costs of our infrastructure overall.

This is assuming there are no urgent reasons why we need to revisit our technical budget. Based on what I've been told, there isn't at present. cc @isabelle-dr @davidgamez

Decision:

If there are any concerns with this timing or insight I'm missing, please let me know!

isabelle-dr commented 6 months ago

This sounds great, thank you for such a transparent & data-driven approach to making this decision.