Dance-Data-Project / smith-capstone-23

MIT License
0 stars 1 forks source link

Dance Data Project - Form 990 Analysis

About

The project is a part of Smith College Statistical and Data Science Capstone in Spring 2023, which is kindly sponsored by Dance Data Projectยฎ โ€œDance Data Projectโ€, a non-profit organization advocating for girls and women in dance. The project aims to look at the longitudinal record of dance company endowments before and after pandemic and analyze the their performances. Particularly, we looked at if there is any noticeable pattern and discrepancies exist in their usage of endowment over time. The repository contains open-access data bytes in html and pdf format that present our analyses.

Contributors

Contributions Name (alpha order)
๐Ÿค” ๐Ÿ”ข ๐Ÿ’ป Ruth Button
๐Ÿ’ป ๐Ÿš‡ ๐Ÿ”ข ๐Ÿค” ๐Ÿ‘€ Rose Evard
๐Ÿ”ฃ ๐Ÿค” ๐Ÿ“† Andrew Hoekstra
๐Ÿ”ข ๐Ÿ’ป ๐Ÿค”๐Ÿ‘€ Zhen Nie
๐Ÿ”ฃ ๐Ÿ”ข ๐Ÿ’ป ๐Ÿค” ๐Ÿ‘€ Quinn White
๐Ÿ’ผ ๐Ÿค” ๐Ÿ“† Elizabeth Yntema

(For a key to the contribution emoji or more info on this format, check out โ€œAll Contributors.โ€)

Dependencies

This code is written for the R programming language (4.2.1) and RStudio. Ensuring the most recent version of both R and RStudio is essential. Any operating system compatible with R and RStudio will work. The necessary packages to install are broom, tidyverse, xml2, kableExtra, here, plotly, scales, readxl, purrr, and shiny. Running INSTALL_ALL.R will load all dependent packages.

Getting Started

Prerequisites

Before running these analyses, we obtained a set of xml files corresponding to companies of interest, where these xml files contain 990 form data in the format reported by the IRS. All R packages needed are installed using INSTALL_ALL.R.

Running Analyses

The script RUN_ALL.R runs all files in the infrastructure_rmds directory as well as the exploration_rmds directory. Html outputs are placed in the output_html subdirectories of infrastructure_rmds and exploration_rmds.

Definitions

Data Sources

Data Collection and Update Process

Are any data processes automated? If so how often is the data updated? If the data needs to updated manually, how would someone go about doing that?

Repo Architecture

This repo contains all code created by Smith SDS Capstone `23 students for Dance Data Project. There are two main files containing rmarkdowns utilized for analyses:

All knitted HTML files from rmarkdowns are within a nested folder called output_html in the respective parent folder.

R scripts with universal functions (GET_VARS.R, INSTALL_ALL.R, RUN_ALL.R) are within the main directory.

Original data utilized for this project are not contained within this repo. However, all data produced by infrastructure rmarkdowns are saved in a folder called data in .RDS form. All analyses assume your data are stored in XML format, in a folder called ballet_990_released_20230208

The folder css contains css code to produce standardized knitted HTMLs.

License

This work is licenced under an MIT license.

How to Provide Feedback

Questions, bug reports, and feature requests can be submitted to this repo's issue queue.

Questions?

Contact Andrew Hoekstra here.