acstat231-f23 / blog-eea

Ebony Wamwitha, Ephrata Getachew, Aika Shorayeva
https://acstat231-f23.github.io/blog-eea/
0 stars 0 forks source link

Blog Plan #1

Open egetachew1 opened 10 months ago

egetachew1 commented 10 months ago

1. Do you plan for your final project to be an extension of the mid-semester project?

The final project will be an extension of the mid-semester project: it will still explore macroeconomic trends across different countries, but, this time around, centering a different set of data. Namely, we will be working with indicators contributing to the Human Development Index, like life expectancy at birth, expected and mean years of schooling, and GNI per capita. We will implement unsupervised learning to cluster the data over mentioned variables and predict what category a country might belong to with respect to its HDI. And since there’s an already existing HDI Rank list, we would be able to conduct cross-comparison between the predicted and actual clusters and evaluate how accurate our model is. 

In addition to clustering, we would like to attempt building a prediction model that would predict the country’s GDP based on HDI indicators (individual and/or combined) and compare the accuracy of different predictions. To do that we will implement a supervised learning algorithm and partition our data to train and test our model. 


2. Describe what you hope to deliver as a final product. Will your blog include a published Shiny application? Will it incorporate an interactive map? Will it involve a predictive model that forecasts future values of some quantity using data that you’ve integrated?

The final product will be a comprehensive blog post that explores macroeconomic trends based on Human Development Index (HDI) indicators. The blog will include results of unsupervised learning for clustering countries based on HDI indicators, a predictive model to forecast a country's GDP using HDI indicators and reproducible code and explanations. The blog does not include a Shiny application or interactive map, it will focus on in-depth analysis and insights from the data, providing valuable information to the users.

3. Outline a schedule for your group’s progress that will take you from now (ideas phase) to final blog post and presentation at the end of the semester. During the last project, we had specific checkpoints for different phases of the project. Based on what you envision for your final blog post, identify checkpoints for your group and dates by which you plan to reach those checkpoints. Hold each other accountable, so you’re not waiting until the last minute to do things! In particular, you should have at least one checkpoint each week (ideally two) identifying what work you expect to complete by then.


  • Completed wrangled datasets and GitHub status update by 11/16 Thursday
  • Create and finish the first model (unsupervised learning) and interpret results and explanations for the blog for the first model - Github status update 2 by 11/30 Thursday
  • Create and finish the second (predictive GDP model) and interpret results and explanations for the blog for the second model by 12/5 Tuesday
  • Presentation/feedback session by 12/7 Thursday
  • Finishing touches of the blog post and implementing feedback from the presentation by 12/11 Mon
  • Final blog post and reflection 2 by 12/13 Wed


katcorr commented 10 months ago

Solid plan! For the prediction modeling part -- we don't do training/testing sets in Stat135 . . . have you all encountered this before in other stat courses or contexts?

Blog plan: 10/10

egetachew1 commented 10 months ago

I have some experience with machine learning from a previous class I took in the summer where we used R and some Python. We used Introduction to Statistical Learning with Applications in R as a textbook.

We are open to exploring and furthering our knowledge on this topic. Please let us know if you have any recommended resources.

Best, Ephrata Getachew

On Nov 10, 2023, at 11:01 AM, Katharine Correia @.***> wrote:

Solid plan! For the prediction modeling part -- we don't do training/testing sets in Stat135 . . . have you all encountered this before in other stat courses or contexts?

Blog plan: 10/10

— Reply to this email directly, view it on GitHub https://github.com/acstat231-f23/blog-eea/issues/1#issuecomment-1806003679, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5SUZ3NOAZVWDGFA5OVT7I3YDZFVTAVCNFSM6AAAAAA7FJMO2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGAYDGNRXHE. You are receiving this because you authored the thread.

aika-sh commented 10 months ago

Solid plan! For the prediction modeling part -- we don't do training/testing sets in Stat135 . . . have you all encountered this before in other stat courses or contexts?

Blog plan: 10/10

I also have an experience building prediction models both within the context of a class and outside of it. In addition, I still have access to some of the DataCamp-based tutorials, which are, although taught within the framework of Python, I believe, could be translated into R and implemented in RStudio.

egetachew1 commented 10 months ago

Status Update 1

In this week's checkpoint, we did data wrangling, focusing on merging and reshaping datasets related to macroeconomic indicators. We combined data from gni_gdp_lifeexp.csv and expected_years_of_schooling.csv renaming columns for clarity and filtering data from 2000 to 2022. However, a challenge that emerged during this process was how to address missing values. We encountered uncertainties regarding how to effectively handle and impute missing values to ensure the integrity of our analysis.

katcorr commented 10 months ago

OK, so you are on track! We discussed the missingness, I imputed values, and your dataset is all set (I think).

Status Update 1: 5/5

egetachew1 commented 9 months ago

Status Update 2

We used k-means clustering to categorize countries based on their economic data. We chose four clusters and focused on recent data for each country, considering life expectancy, GNI per capita, and expected years of schooling. To make comparisons fair, we standardized these factors. We added this clustering information back into our dataset called macro_trends_with_clusters. Additionally, we created an interactive 3D scatter plot using plotly to show how countries are grouped based on these economic indicators. The main issue we had was with scaling, but we resolved it. Next, we will be working on the aesthetics/interface part.

katcorr commented 9 months ago

Great!

Status Update 2: 5/5