busi732 / Team-2-2024Fall

This is a project repository for Team 2 members in BUSI 732 Quantitative Research, 2024Fall
0 stars 3 forks source link

ADR for EDA Tools #16

Closed ajhaller closed 6 days ago

ajhaller commented 2 weeks ago

Objective

I want to create an ADR about at least three different Exploratory Data Analysis (EDA) tools and the reasoning behind our choice of tool.

Acceptance Criteria

JavierACM commented 1 week ago

Table of Contents

Intro

Goal: Select the best-fit Exploratory Data Analysis (EDA) tool for our project's needs.

Date: 2024-11-16

Status: Accepted

Context

For our project, we have three different datasets with many columns altogether. To create an excellent predictive model, we need reliable, digestible data. Data Profiling - a concept heavily linked to Exploratory Data Analysis - is a great way to understand our data and the problems that may arise.

Decision

After carefully going through five possible EDA tools, we chose DataPrep for our EDA needs. When reading over our options, we enjoyed the features that DataPrep provides, especially considering it's a seemingly better version of the EDA tool we initially considered, ydata-profiling. The following are the tools we considered and short descriptions of them:

Consequences

When choosing DataPrep, we won't benefit from using a beginner-friendly package like AutoViz and Lux. However, we reap the benefits of a more comprehensive feature analysis. We also will not be able to have a stronger focus on feature variables and subgroups that SweetViz can provide, but it's more valuable for us to have intuitive data quality alerts. Finally, we would enjoy a better API and faster analysis than ydata-profiling.

Implementation

We would first implement DataPrep for initial EDA on our fault, SCADA, and merged datasets to view data quality alerts and seek to understand the relationship between different features related to our goal.

References

Modern - Awesome Data Science Tools to Master in 2023: Data Profiling Edition

Modern - Comparing the Five Most Popular EDA Tools

Revision History

2024-11-16 - Draft 1 complete

JavierACM commented 1 week ago

Please let me know if there are any changes I can or should make. @ajhaller @rr-85 @gibby-ci

rr-85 commented 1 week ago

@JavierACM I think the ADR is very detailed and justified the use of DataPrep compare to other, we can add to the Wiki.