This README provides an overview of our project's directory structure and guidelines for what to place in each folder.
This directory contains all our data files.
P_ALB_CR.XPT
, P_ALQ.XPT
, etc.merged_data_clean.csv
: The final merged dataset combining all raw tables.data_quality_summary.csv
: Summary of data quality checks for our datasets.This directory is for storing operational logs.
deduplication_log.csv
: Log of the deduplication process, including any records removed.Store all scripts used for data processing, cleaning, and analysis here.
data_quality_check.py
: Script for checking the raw data.convert_XPT.R
: Script for merging individual tables.This folder is for project documentation.
data_desc.txt
: Definitions and classes of all variables in our datasets.Store your analysis files here, such as Jupyter notebooks/ R markdown.
background-and-data.ipynb
: Missing data analysis, and some EDAThis directory is for outputs of your analysis.