Dr-Eberle-Zentrum / Data-projects-with-R-and-GitHub

6 stars 3 forks source link

Feedback for DrMohamedElsherif #255

Closed MiguelDLM closed 3 days ago

MiguelDLM commented 3 months ago

Hi, here are some comments that may help to improve your proposal:

  1. Describe the characteristics of your data (variables names and types)
  2. A summary of the packages you plan to use. For imputation suggest using Amelia Package. But before performing the imputation I suggest you check if is necessary to impute by running a Little's Test for Missin Values for example using the Misty Package. Imputation is only recommended when a Not-At-Random pattern is found
  3. Include an example of the plots you want to generate
DrMohamedElsherif commented 3 months ago

Rebuttal

First of all, Thank you for your encouraging words and insightful suggestions for my project.

Addressing your concerns step by step: Response to Points 1: The dataset is neither found on the website nor online. It was a part of a Data science project management project of a past course and has no available cookbook. The dataset set is composed of 4 excel sheets that together represent all information needed to analyze and correlate the data: a) a Bike count sheet: This has the count of bike over each hour for each day starting from January 2018 and till January 2024 across different channels (stations along each of the three paths where counter devices are installed) indicated by channel_id for each of the three counter sites, indicated by counter_id. b) a Counter Site sheet that has the names and the ids of the three counter sites; c) a Weather sheet: That has weather information including temperature, wind, humidity, rain, etc for each day of the years 2011 and 2012. d) Finally the Holidays sheet: That shows the federal and state holidays for years 2018 and 2024. This sheet is important for correlating the bike traffic across sites with regular workdays versus holidays. Nonetheless, I will include the head of the tables in my markdown.

Response to Points 2-3: An issue in this dataset is the missing data, which is left to the researcher to decide which method is the best the deal with this problem; a skill to learn by doing this project. Your suggestion provides an excellent start point. I will also include an example visualization for the project.

I hope project is now clearer. For further concerns, please do not hesitate to provide further valuable comments.