Closed danfke closed 2 years ago
The clean dataset was obtained from TidyTuesday who in turn obtained the raw data from the USDA. The following description is provided with the data:
This report provides information on honey bee colonies in terms of number of colonies, maximum, lost, percent lost, added, renovated, and percent renovated, as well as colonies lost with Colony Collapse Disorder symptoms with both over and less than five colonies. The report also identifies colony health stressors with five or more colonies. The data for operations with honey bee colonies are collected from a stratified sample of operations that responded as having honey bees on the Bee and Honey Inquiry and from the NASS list frame. For operations with five or more colonies, data was collected on a quarterly basis; operations with less than five colonies were collected with an annual survey.
The colony
dataset contains:
state
: State of USA, also contains values, Other States
which is data for states not included in the dataset, and United States
which aggregates data for the whole countryyear
, months
: time period broken up into quarterscolony_n
: number of coloniescolony_max
: maximum number of coloniescolony_lost
: colonies lost since start of quartercolony_lost_pct
: Number of lost colnies divided by maximum colonies. For United States, percent lost is number of lost colonies divided by January 1 colonies.colony_reno
: Surviving colonies that are requeened or received new honey bees.colony_reno_pct
: Number of renovated colonies divided by maximum colonies except for the United States, where percent renovated is the number of renovated colonies divided by the January 1 colonies.The stressors
dataset contains the year, months, and states variables as well as:
stressor
: Colony health stressors.stress_pct
: Percent of colonies affected by stressors anytime during the quarter, colony can be affected by multiple stressors during same quarter.The honey bee colonies and stressors dataset was obtained from TidyTuesday who in turn obtained the raw data from the USDA.
We will be visualizing the information on honey bee colonies and stressors in terms of number of colonies, maximum, lost, percent lost, added, renovated, and percent renovated, as well as colonies lost with Colony Collapse Disorder symptoms with both over and less than five colonies.
The time series data are collected from a stratified sample of operations that responded as having honey bees on the Bee and Honey Inquiry and from the NASS list frame in USA. For operations with five or more colonies, data was collected on a quarterly basis; operations with less than five colonies were collected with an annual survey.
Do we need to include the Data Dictionary in the description of the data?
No I don't think so but we need to specify which variables we are visualizing.
We could add something like this to the end:
In our dashboard we visualize colony_n
, which is the number of colonies in a state, through a time series plot. Using a geographic map we plot different states' colony loss percentages (colony_lost_pct
). Additionally we visualize the percentage of colonies (stress_pct
) affected by different colony health stressors (stressor
) for a state over time through stacked bar plots.
In your proposal, briefly describe the dataset and the variables that you will visualize. If your are planning to visualize a lot of columns, provide a high level descriptor of the variable types rather than listing every single column. For example, indicate that the dataset contains a variety of categorical variables for demographics and provide a brief list rather than describing every single variable. You may also want to consider visualizing a smaller set of variables given the short duration of this project. This might include brief exploratory data analysis for you to grasp what could be interesting aspects to look at in your data. We will not be grading the EDA aspect, but feel free to include your EDA notebooks in the public GitHub repo, so that you have everything in one place.