UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: eazieda (Python) #16

Open arashshams opened 3 years ago

arashshams commented 3 years ago

Submitting Author:

Package Name: eazieda
One-Line Description of Package: eazieda makes data wrangling and exploratory data analysis (EDA) quite simple and fast Repository Link: eazieda Version submitted: 0.1.11 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

Almost every data analysis project involves the process of doing some exploratory data analysis (EDA) and data preprocessing. Usually they serve as a very crucial and inevitable step in a data analysis workflow. Typically these steps are followed by some preprocessing like imputation and dealing with outliers. All of these steps together may require lots of coding effort and can be repeated for several projects. To solve this issue, Python package eazieda is designed so that it wraps all of those lines of code into four convenient functions that will allow you to quickly and easily carry out EDA along with some simple preprocessing using just a few lines of code!

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

eazieda has the functionality to produce interactive plots (e.g. histograms and correlation plots) to graphically demonstrate the distribution and correlation of features inside a given dataset. Another functionality of eazieda is data wrangling since at its core it is designed to deal with missing data and outliers.

The target audience would be those who are interested to get an interactive visualization of the dataset at hand and also people who wish to do a quick data munging especially if their dataset contains missing values and outliers.

There are similar Python packages such as "pandasprofiling" or "sweetviz", but eazieda's functionality is to address the most-wanted EDA and Data wrangling jobs quickly and conveniently. Another difference is that eazieda is quite light weighted.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

jachang0628 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1 hour


Review Comments

Hello Eazieda Team,

Good job on completing the package! I am very impressed that you guys wrote 6 functions even though you only have 4 people! Good job on writing the documentations in the vignette as they are very easy to follow and understand. Now onto areas where I think this package could improve on:

arashshams commented 3 years ago

Many thanks @jachang0628 for your review. We will be taking your comments in consideration to improve our package.

vigneshRajakumar commented 3 years ago

Thanks @jachang0628 for the useful comments and taking the time for the thorough review!

You raise an interesting point about the outlier function. The default method uses a z-score threshold of 3 to detect outliers. In the example you used, the 'outlier' has a z-score of less than 3. So it is working as intended, since that's just how z-scores work; they aren't ideal for finding outliers in small samples. Having said that, you have a point that this could be misleading. It might be better if we used iqr as the default method instead. I'll check this with the other contributors and implement it

These comments are super helpful and we'll put them in our backlog! (Especially the exception handling one, that's a good catch!)

zhijingjing1 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2


Review Comments

Dear Vignesh, Dustin , Arash, and Yuyan,

Great job on completing your project! I enjoyed reviewing your work. These are some of my suggestions and I hope that they are useful:

Thanks for providing us such an awesome package.

Jingjing