UBC-MDS / Bee_Colony_Dashboard

A dashboard providing apiarists with valuable insights into bee colony health.
https://bee-colony-dashboard.herokuapp.com
MIT License
0 stars 1 forks source link

Proposal Section 2: Description of the data #4

Closed danfke closed 2 years ago

danfke commented 2 years ago

In your proposal, briefly describe the dataset and the variables that you will visualize. If your are planning to visualize a lot of columns, provide a high level descriptor of the variable types rather than listing every single column. For example, indicate that the dataset contains a variety of categorical variables for demographics and provide a brief list rather than describing every single variable. You may also want to consider visualizing a smaller set of variables given the short duration of this project. This might include brief exploratory data analysis for you to grasp what could be interesting aspects to look at in your data. We will not be grading the EDA aspect, but feel free to include your EDA notebooks in the public GitHub repo, so that you have everything in one place.

danfke commented 2 years ago

The clean dataset was obtained from TidyTuesday who in turn obtained the raw data from the USDA. The following description is provided with the data:

This report provides information on honey bee colonies in terms of number of colonies, maximum, lost, percent lost, added, renovated, and percent renovated, as well as colonies lost with Colony Collapse Disorder symptoms with both over and less than five colonies. The report also identifies colony health stressors with five or more colonies. The data for operations with honey bee colonies are collected from a stratified sample of operations that responded as having honey bees on the Bee and Honey Inquiry and from the NASS list frame. For operations with five or more colonies, data was collected on a quarterly basis; operations with less than five colonies were collected with an annual survey.

The colony dataset contains:

The stressors dataset contains the year, months, and states variables as well as:

Davidwang11 commented 2 years ago

Description of the data

The honey bee colonies and stressors dataset was obtained from TidyTuesday who in turn obtained the raw data from the USDA.

We will be visualizing the information on honey bee colonies and stressors in terms of number of colonies, maximum, lost, percent lost, added, renovated, and percent renovated, as well as colonies lost with Colony Collapse Disorder symptoms with both over and less than five colonies.

The time series data are collected from a stratified sample of operations that responded as having honey bees on the Bee and Honey Inquiry and from the NASS list frame in USA. For operations with five or more colonies, data was collected on a quarterly basis; operations with less than five colonies were collected with an annual survey.

Davidwang11 commented 2 years ago

Do we need to include the Data Dictionary in the description of the data?

danfke commented 2 years ago

No I don't think so but we need to specify which variables we are visualizing.

We could add something like this to the end:

In our dashboard we visualize colony_n, which is the number of colonies in a state, through a time series plot. Using a geographic map we plot different states' colony loss percentages (colony_lost_pct). Additionally we visualize the percentage of colonies (stress_pct) affected by different colony health stressors (stressor) for a state over time through stacked bar plots.