UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: instaeda (Python) #19

Open roycezhou opened 3 years ago

roycezhou commented 3 years ago

Submitting Author:

Package Name: instaeda One-Line Description of Package: Quick and easy way to clean data and build exploratory data analysis plots Repository Link: instaeda_py Version submitted: 0.1.6 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

Instaeda provides a quick and easy way to clean data and build exploratory data analysis plots. This idea came up as we have been building data projects for quite some time now in the UBC MDS program. We noticed that there are some repetitive activities that occur when we first explore the data. This project will help you take a given raw data set an conduct some data cleansing and plotting with a minimal amount of code. There are four main components of this package which are data checking, data cleaning, exploratory visualization including numerical correlation plotting and basic distribution plotting by datatype.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

The package falls under data visualization and exploratory data analysis since it performs to explore the raw data with data checking, data cleaning, exploratory visualization including numerical correlation plotting and basic distribution plotting by datatype with a minimal amount of code.

The target audience of this package includes anyone who has the requirements to clean data and build exploratory data analysis plots. For instance, students with computer science and data science background might be the target audience. Besides, data scientists, data engineers, statisticians are possible target users as well.

There are some Python packages to conduct exploratory data analysis (EDA) such as "Pandas Profiling" and "Autoviz". However, our Python package provides a different working flow and functionality. There are four main components of our Python package and each of the functions has its innovation points. For example, our data checking function gives users a sense of the whole distribution summary of the raw data. Although there are some built-in functions to summarize the missing data, mean, standard deviation, etc, our function implements straightforward plotting including several bar charts to describe the data summary with metrics considering the numeric columns, factor columns, complete rows and missing observations.

NA

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

trevorki commented 3 years ago

Hello, please find my package review below:

Package Review

Documentation

The package includes all the following forms of documentation:

Readme requirements

The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2.5

Review Comments

Thanks for the submission. This looks like a useful package that just needs some minor polishing to be ready to go. I left thorough notes inline (above) but have summarized the main points here:

Please let me know if you have questions or want to discuss anything further.

wang-rui commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in the comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. The package structure should follow general community best-practices. In general please consider:

Functionality

-[x] Installation: Installation succeeds as documented.

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:

2.5

Review Comments

Hi Justin, Selma, Zeliha, Siqi, Thanks for putting this the package together. I had fun playing around with the instaeda_py package and they are definitely going to make our data exploration more convenient.

While I was playing with the package, I noted a few things down below, and hope it could help to improve the overall experience of the package more.

Installation

I tried following package installation instructions following your README file, but sadly it fails in an empty Python 3 environment, which gave me the exception below:


ERROR: Cannot install instaeda==0.1.0, instaeda==0.1.1, instaeda==0.1.2, instaeda==0.1.6, instaeda==0.1.7 and instaeda==0.1.8 because these package versions have conflicting dependencies.

The conflict is caused by: instaeda 0.1.8 depends on vega-datasets<0.10.0 and >=0.9.0 instaeda 0.1.7 depends on vega-datasets<0.10.0 and >=0.9.0 instaeda 0.1.6 depends on vega-datasets<0.10.0 and >=0.9.0 instaeda 0.1.2 depends on vega-datasets<0.10.0 and >=0.9.0 instaeda 0.1.1 depends on vega-datasets<0.10.0 and >=0.9.0 instaeda 0.1.0 depends on vega-datasets<0.10.0 and >=0.9.0


> However, it seems I could install it after adding the `--extra-index-url` option.

pip install -i https://test.pypi.org/simple --extra-index-url https://pypi.org/simple instaeda

#### Tests

> I really like how many test cases you created and they seem to be in large detail. 

> I tried to clone the repo to my laptop and run the pytest locally. The command I used was `poetry install` then `poetry run pytest`, then I ran into an error below:

ModuleNotFoundError: No module named '_cffi_backend'


> I'm not sure if it is an issue from my environment as I created a new environment to test this...:(

#### Function Documentation

> I really loved your function documentations. The examples in the docstrings are very clear to understand. 

#### Documentation

> I only see Justin’s name in the copyright info.  I wonder if this should be all members of the team.
> It seems you don't have a license discussion issue...:/
> The usage part on the [readthedocs](https://instaeda.readthedocs.io/en/latest/usage.html) is not there yet.