Open stevenleung2018 opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
Overall the package works well, good job! Just a couple of suggestions for improvement.
The example has a typo.
from EDAhelper.preprocess import preprocess #This doesn't work for me
I changed it to the name of your script file and it worked.
from EDAhelper.EDAhelper import preprocess
In ReadMe, EDAhelper.preprocess('file_path')
is provided as an example. I had trouble understanding what I should put for file_path
for a second. It would be easier to understand if we have examples like the one in ReadThedoc (local data). I immediately got that the whole path with .csv is needed.
I tested out column_stats(df, columns)
, got an error saying values should be of type integer
. I think a better warning could be using float
. I tested and the function seems to work in float
columns too!
file_path = 'https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv'
df = preprocess(file_path)
columns = ['PassengerId', 'Name']
column_stats(df, columns)
Two road map suggestions. It would be nice for plot_history
and numeric_plots
to take an extra argument, which is a list of the particular columns that I want to look at.
Another road map suggestion is hopefully the package can also address non-numeric plots as well in the future!
Love your work! Well done and we made it for another block!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 0.4 hour
Great job! The general goal is clear and the package can be installed and run successfully on my local. Just a small suggestion: you could @ the author in the readme file, or you could add url to your GitHub homepage.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
preprocess
and you have all other functions with separate files which is confused. It might be better to be consistent, either all named with function name or put all functions into the EDAhelper file.'file_path’
or columns = ('Date', 'PctPopulation', 'CrimeRatePerPop')
stands for?from EDAhelper.preprocess import preprocess
. I have to edited it as: from EDAhelper.EDAhelper import preprocess
. EDAhelper.preprocess('file_path’)
is not working. I have to use preprocess('file_path’)
instead. Same to the other three functions.column_stats
function, it fails when the column names are a string with spaces. You might need to handle this situation, either process it or return errors. Currently, the error is with these lines:
for row in data[column]: if isinstance(row, str): raise TypeError("values should be of type integer”)
But the error should be about the column names. numeric_plots
, it might be better to add columns
attributes to be consistent as your other three functions.Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
column_stats()
, the example function call and the output do not match, which is confusing.
name: Submit Software for Review about: Tools to make EDA easier. title: EDAhelper labels: 1/editor-checks, New Submission! assignees: Irene Yan, Macy Chan, Yike Shi, Chaoran Wang
Submitting Author: Steven Leung (@stevenleung2018)
Package Name: EDAhelper One-Line Description of Package: This package is aimed at making the EDA process more efficient by minimizing the many lines of code to 4 function calls. Repository Link: https://github.com/UBC-MDS/EDAhelper Version submitted: 1.2.1 Editor: Vera Cui, Rowan Sivanandam, Jennifer Hoang Reviewer 1: Irene Yan
Reviewer 2: Macy Chan Reviewer 3: Yike Shi Reviewer 4: Chaoran Wang Archive: TBD
Version accepted: TBD
Description
Clean the data and replace missing values by using the method preferred. Provide the description of the data such as the distribution of each column of the data. Give the correlation plot between different numeric columns automatically. Combine the plots and make them suitable for the report.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
This is a package for EDA (Exploratory Data Analysis). It therefore has elements of data retrieval, extraction, munging and some data visualization.
It is targeted for any data scientist who does Explanatory Data Analysis.
Surely, EDA is not a new topic to data scientists. There are quite a few packages doing similar work on PyPI. However, most of them only include limited functions like just providing descriptive statistics. Our proposal is more of a one-in-all toolkit for EDA. Below is a list of sister-projects.
auto-eda : It is an automatic script that generating information in the dataset. easy-eda : Exploratory Data Analysis. quick-eda : Important dataframe statistics with a single command. eda-report : A simple program to automate exploratory data analysis and reporting.
The above-mentioned packages position themselves differently compared to EDAhelper. Some packages do only 1 function. Some packages provide functions which almost generate a comprehensive report. We want to provide a packages with individual function calls which can be used in different parts of a report.
@tag
the editor you contacted:NIL
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here