Closed SiRumCz closed 4 years ago
To give better visual presentation for payment-trend-timeline
, I and Soroush disussed about it and both agreed that the dataset sample provided is not sufficient to be presented(data since 2019 January is completely missing), and we decided to move one and work on the original 112M dataset to extract full January to December taxi trips.
After I worked on the original dataset, I extracted 112,234,626 rows of data, and applied two filters to narrow the period of the data down to 2018(Jan to Dec) and remove duplicate data. The size of data is 102,801,293 (around 11.8GB .db file) after filtering.
@SiRumCz Wow cool sounds great!
Soroush raised a problem with our current data schema. He suggests we should enrich our data model by dividing data into different time-related tables and adding more temporal information to them.