Closed ajhaller closed 6 days ago
Goal: Select the best-fit Exploratory Data Analysis (EDA) tool for our project's needs.
Date: 2024-11-16
Status: Accepted
For our project, we have three different datasets with many columns altogether. To create an excellent predictive model, we need reliable, digestible data. Data Profiling - a concept heavily linked to Exploratory Data Analysis - is a great way to understand our data and the problems that may arise.
After carefully going through five possible EDA tools, we chose DataPrep for our EDA needs. When reading over our options, we enjoyed the features that DataPrep provides, especially considering it's a seemingly better version of the EDA tool we initially considered, ydata-profiling. The following are the tools we considered and short descriptions of them:
When choosing DataPrep, we won't benefit from using a beginner-friendly package like AutoViz and Lux. However, we reap the benefits of a more comprehensive feature analysis. We also will not be able to have a stronger focus on feature variables and subgroups that SweetViz can provide, but it's more valuable for us to have intuitive data quality alerts. Finally, we would enjoy a better API and faster analysis than ydata-profiling.
We would first implement DataPrep for initial EDA on our fault, SCADA, and merged datasets to view data quality alerts and seek to understand the relationship between different features related to our goal.
Modern - Awesome Data Science Tools to Master in 2023: Data Profiling Edition
Modern - Comparing the Five Most Popular EDA Tools
2024-11-16 - Draft 1 complete
Please let me know if there are any changes I can or should make. @ajhaller @rr-85 @gibby-ci
@JavierACM I think the ADR is very detailed and justified the use of DataPrep compare to other, we can add to the Wiki.
Objective
I want to create an ADR about at least three different Exploratory Data Analysis (EDA) tools and the reasoning behind our choice of tool.
Acceptance Criteria