FreshJonas / Nat_Disasters_vs_GDP

0 stars 0 forks source link

02 2.3 Combining Datasets Columns GDP-1, GDP, GDP+1, GDP+2, GDP+3 - redundant? Group Discussion #5

Open variableVG opened 2 years ago

variableVG commented 2 years ago

I think this information is redundant, since in theory it should be possible to request the information on GDPs of each country in a given number of years through queries.

Probably, we should use two tables/documents in the database, one for gdp's and one for natural disasters. We can discuss this tomorrow also with the professor.

For now, we can leave the 5 columns, because in principle we are not going to modify the data (just reading queries) and it will help us understand the data and verify it faster, but for the final project I think we should work with two tables/documents and joins? I also need to read more about that and how to implement it.

FreshJonas commented 2 years ago

Yes that is definitely something we can consider. I took the task of 'merge multiple data sources appropriately' very literal in this first try but if we do it your way we could also learn more about handling big data with spark / mapReduce. We can also try to finish our project in a more basic form first and then try to implement big changes like that later. This way we will have a project that is ready for delivery at all times