Guiding questions
● Where is your data stored? GITHUB
● How is the data organized? Is it in long or wide format?
DailyActivityMerged - Long data, Structured data, Quantitative, Primary
DailyCaloriesMerged - Long data, structured data, quantitative, primary
DailyIntensitiesMerged - Long data, Structured Data, Quantitative, Primary
DailyStepsMerged - Long data, Structured Data, Quantitative, Primary
HourlyCaloriesMerged - Long data, Unstructured data, Primary
HourlyIntensityMerged - Long data, Unstructured data, Primary
HourlyStepsMerged - Long data, Unstructured data, Primary
MinuteCaloriesWideMerged - big
MinuteIntensitiesWideMerged - big
MinuteSleepMerged - big
MinuteStepsWideMerged -big
SleepDayMerged - Long data, Structured Data, Quantitative, Primary
WeightLogInfo - Long data, Structured Data, Quantitative, Primary
● Are there issues with bias or credibility in this data? Does your data ROCCC?
Provided directly from the company, meaning this data is reliable, original, comprehensive, current and cited.
● How are you addressing licensing, privacy, security, and accessibility?
The data from this case study is provided for study purposes meaning there should not be problems with licensing, privacy, security. Problem with accessibility in opening big data sets in Github.
● How did you verify the data’s integrity?
Since the data is provided directly from the Google Data Analysis Certificate for practice purposes, the data's integrity is not a concern.
● How does it help you answer your question?
The data provided supports our case study by showcasing client information from different products to generate insights to solve the business task that has been requested by the CoFounders (main stakeholders).
● Are there any problems with the data?
The only problems with the data is that some files are unstructured data and others are too big to see in GitHub.
Key tasks
Download data and store it appropriately.
Identify how it’s organized.
Sort and filter the data.
Determine the credibility of the data.
Deliverable
A description of all data sources used
Step 1:
Download all data to BigQuery. We chose to work with SQL because for this casestudy the data might be need to be stored in a database for other teammembers to have access.
After downloading some files, there was this error because the data was not organized correctly. After some investigating, found an article in Stackoverflow explaining that date and time has to be in the following format: YYYY-MM-DD 00:00:00. Downloaded the data into Google Drive and adjusted the data before downloading again and uploading succesfully into BigQuery.
Spreadsheets that required same solution: MinuteCaloriesWideMerged, MinuteIntensitiesWideMerged, MinuteSleepMerged, MinuteStepsWideMerged.
Repeated the process to adjust Date Data to similar format in all the other spreadsheets. Uploaded to query and divided them into 4 categories: DailyData, HourlyData, ExtraData and MinuteData
Guiding questions ● Where is your data stored? GITHUB ● How is the data organized? Is it in long or wide format? DailyActivityMerged - Long data, Structured data, Quantitative, Primary DailyCaloriesMerged - Long data, structured data, quantitative, primary DailyIntensitiesMerged - Long data, Structured Data, Quantitative, Primary DailyStepsMerged - Long data, Structured Data, Quantitative, Primary HourlyCaloriesMerged - Long data, Unstructured data, Primary HourlyIntensityMerged - Long data, Unstructured data, Primary HourlyStepsMerged - Long data, Unstructured data, Primary MinuteCaloriesWideMerged - big MinuteIntensitiesWideMerged - big MinuteSleepMerged - big MinuteStepsWideMerged -big SleepDayMerged - Long data, Structured Data, Quantitative, Primary WeightLogInfo - Long data, Structured Data, Quantitative, Primary ● Are there issues with bias or credibility in this data? Does your data ROCCC? Provided directly from the company, meaning this data is reliable, original, comprehensive, current and cited. ● How are you addressing licensing, privacy, security, and accessibility? The data from this case study is provided for study purposes meaning there should not be problems with licensing, privacy, security. Problem with accessibility in opening big data sets in Github. ● How did you verify the data’s integrity? Since the data is provided directly from the Google Data Analysis Certificate for practice purposes, the data's integrity is not a concern. ● How does it help you answer your question? The data provided supports our case study by showcasing client information from different products to generate insights to solve the business task that has been requested by the CoFounders (main stakeholders). ● Are there any problems with the data? The only problems with the data is that some files are unstructured data and others are too big to see in GitHub.
Key tasks
Step 1: Download all data to BigQuery. We chose to work with SQL because for this casestudy the data might be need to be stored in a database for other teammembers to have access. After downloading some files, there was this error because the data was not organized correctly. After some investigating, found an article in Stackoverflow explaining that date and time has to be in the following format: YYYY-MM-DD 00:00:00. Downloaded the data into Google Drive and adjusted the data before downloading again and uploading succesfully into BigQuery. Spreadsheets that required same solution: MinuteCaloriesWideMerged, MinuteIntensitiesWideMerged, MinuteSleepMerged, MinuteStepsWideMerged. Repeated the process to adjust Date Data to similar format in all the other spreadsheets. Uploaded to query and divided them into 4 categories: DailyData, HourlyData, ExtraData and MinuteData