UBC-MDS / software-review-2022

0 stars 0 forks source link

Group 5: EDAhelper (Python) #22

Open stevenleung2018 opened 2 years ago

stevenleung2018 commented 2 years ago

name: Submit Software for Review about: Tools to make EDA easier. title: EDAhelper labels: 1/editor-checks, New Submission! assignees: Irene Yan, Macy Chan, Yike Shi, Chaoran Wang


Submitting Author: Steven Leung (@stevenleung2018)
Package Name: EDAhelper One-Line Description of Package: This package is aimed at making the EDA process more efficient by minimizing the many lines of code to 4 function calls. Repository Link: https://github.com/UBC-MDS/EDAhelper Version submitted: 1.2.1 Editor: Vera Cui, Rowan Sivanandam, Jennifer Hoang Reviewer 1: Irene Yan
Reviewer 2: Macy Chan Reviewer 3: Yike Shi Reviewer 4: Chaoran Wang Archive: TBD
Version accepted: TBD


Description

Clean the data and replace missing values by using the method preferred. Provide the description of the data such as the distribution of each column of the data. Give the correlation plot between different numeric columns automatically. Combine the plots and make them suitable for the report.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

This is a package for EDA (Exploratory Data Analysis). It therefore has elements of data retrieval, extraction, munging and some data visualization.

It is targeted for any data scientist who does Explanatory Data Analysis.

Surely, EDA is not a new topic to data scientists. There are quite a few packages doing similar work on PyPI. However, most of them only include limited functions like just providing descriptive statistics. Our proposal is more of a one-in-all toolkit for EDA. Below is a list of sister-projects.

auto-eda : It is an automatic script that generating information in the dataset. easy-eda : Exploratory Data Analysis. quick-eda : Important dataframe statistics with a single command. eda-report : A simple program to automate exploratory data analysis and reporting.

The above-mentioned packages position themselves differently compared to EDAhelper. Some packages do only 1 function. Some packages provide functions which almost generate a comprehensive report. We want to provide a packages with individual function calls which can be used in different parts of a report.

NIL

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

MacyChan commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

Overall the package works well, good job! Just a couple of suggestions for improvement.

  1. The example has a typo.

    from EDAhelper.preprocess import preprocess #This doesn't work for me

    I changed it to the name of your script file and it worked.

    from EDAhelper.EDAhelper import preprocess
  2. In ReadMe, EDAhelper.preprocess('file_path') is provided as an example. I had trouble understanding what I should put for file_path for a second. It would be easier to understand if we have examples like the one in ReadThedoc (local data). I immediately got that the whole path with .csv is needed.

  3. I tested out column_stats(df, columns), got an error saying values should be of type integer. I think a better warning could be using float. I tested and the function seems to work in float columns too!

    file_path = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv'
    df = preprocess(file_path)
    columns = ['PassengerId', 'Name'] 
    column_stats(df, columns)
  4. Two road map suggestions. It would be nice for plot_history and numeric_plots to take an extra argument, which is a list of the particular columns that I want to look at.

  5. Another road map suggestion is hopefully the package can also address non-numeric plots as well in the future!

Love your work! Well done and we made it for another block!

MaeveShi commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 0.4 hour


Review Comments

Great job! The general goal is clear and the package can be installed and run successfully on my local. Just a small suggestion: you could @ the author in the readme file, or you could add url to your GitHub homepage.

showcy commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

shyan0903 commented 2 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1 hour

Review Comments