Open lori94 opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
2 hours
Great package! The README is very easy to understand, the examples you used show clearly the purpose of your functions. Your package could help a lot of people!
Here are a few ideas to maybe improve your package :
Example
part, you are not supposed to put the output the function. I would delete the output of the examples in the explore_missing
and in the explore_summary
functions.>>>
at the beginning of the lines of code that are your examples.test_explore_summary
function. Moreover, I am not sure if comments work as docstrings, maybe you should try to have proper docstrings for the functions that are in the test_explore_feature_map.py
file too.explore_summary
function so that it raises an error when the input is not a data frame. Then you should add a test in the test_explore_summary
function to assert that your function throws an error if the input is not a data frame. explore_summary.py
file, line 53, you wrote radom = random = ...
explore_summary
), you could use the map
function, and then a for
loop, to avoid having several times the same line of code with the only element that changes being the function that you apply. See here to see an example of what I mean by using the map function. explore_summary
function, it would be great if it could take a function as an argument (for example an anonymous function), and then this function would be applied to every column of the data frame. For example, imagine I want to count the number of negative values in each column of my data frame, it would be really cool if I could use lambda x : np.sum(x<0)
or something like this, in your function, and then there would be a new row in the output of the explore_summary
function that would display the number of negative values in each column. I hope you will find my review useful! Please, let me know if you have any questions/comment on my feedbacks
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
Overall, well done. This package meets all the expectations listed in the project's objectives. All functions are useful to reduce time in EDA!
explore_summary
It is convenient and intuitive to directly distinguish and return the types of variables. Something should be improved is the name of descriptive statistics in your returned table. I think using percentile percentage as the name would be more straightforward such as 25% and 75% instead of '1st' and '3rd'. Also, it would be better to return values count for each category for categorical variables. Moreover, should raise an error like "input data type must be pandas.DataFrame" when input is not a data frame.explore_outliers
Nice function to detect outliers but it would be better if available value ranges could be indicated and the exact values of outliers could be listed at the same time, not just the count. In addition, I think you should clarify that your function is only for numeric variables instead of all variables and you drop NA
in the process of calculation in your docstrings and function description.explore_missing
Also a useful function. I think it would be better if an empty data frame returned when there are no missing values instead of raising an error. Also, this function can't identify empty strings (i.e. "") which are also missing values for categorical variables. For your num_row
argument, there is no error raised when the value is out of scope or negative and when the input value is not an integer. For the data
argument, it would be better if it could be changed to df
to keep consistent with the argument in other functions.explore_feature_map
Nice and clear plots! This function is useful for numeric variables but seems to do nothing with categorical variables. I think you should clarify this in your function description. What's my suggestion is that you could also include categorical variables and use ANOVA test statistics or p-value to show the correlation between numeric variables and categorical variables. Also could do the chi-square test to find the correlation between two categorical variables. In addition, wrong input type for the dataframe
argument should be a TypeError
instead of a ValueError
. I hope my suggestions would be helpful for your future modification.
@clsu22 & @sirine-chahma, thanks for the time you spent reviewing our package!
@clsu22, most of your feedback has been addressed. The only item that is still being discussed is regarding the categorical values handling for the explore_feature_map
function. We left it as an open-ended issue here https://github.com/UBC-MDS/pyxplr/issues/67
@sirine-chahma, your feedback has been addressed except the last two items that are still in progress. You can find the issue with current progress here https://github.com/UBC-MDS/pyxplr/issues/57
Thank you!
Submitting Author: Serhiy Pokrovskyy(@pokrovskyy ), Braden Tam(@bradentam ), Furqan Khan(@fkhan72 ), Yu Fang(@lori94) Package Name:
pyxplr
One-Line Description of Package: pyxplr is a Python package to perform explanatory data analysis (EDA) simple and seamless. Repository Link: https://github.com/UBC-MDS/pyxplr Version submitted: v0.3.8 Editor: Varada Kolhatkar (@kvarada ) Reviewer 1: Haoyu Su (@clsu22)Reviewer 2: Sirine Chahma (@sirine-chahma)
Archive: TBD
Version accepted: TBD
Description
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here