Open andrealee011 opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
General Comments I think you have done a good job with this package. It addresses a specific need. There are some issues about the package Documentation and Functionality.
missing_val
function, the style for parameters is
But in the docstring for the ForSelect
function, the style of parameters is
fit_and_report
and ForSelect
functions as shown in the following screenshots:
missing_val
, ForSelect
and feature_splitter
functions as shown in the following screenshots:
missing_val
as shown in the following screenshot:If it is possible, please address these issues and I'll review the package again. Thanks.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
3h
Compliments:
Suggestions:
1) It might be useful to add usage results in the README.md file. This will help users to see the results from your functions directly and enhance their understanding of your package.
2) For function missing_val
:
delete
works fine, but mean
still contains NaN values, and knn
throws an error. My suggestion here is to add a test to only include numbers in a dataframe such that strings or booleans are not allowed. Examples below:3) For function ForSelect
:
The function produces repetitive text which is not necessary.
In the example above, the minimum features selected should be 2, but the result only gives me one feature. I also tried with other datasets, the function always produced the same result no matter how I changed min_features
and max_features
.
4) For function feature_splitter
:
It might be useful to produce the result as a dataframe and add labels to it. It will help users to know which columns are numerical and which columns are categorical.
Also, the function seems to split the columns according to whether they include strings or numbers instead of categorical or numerical features. For instance, in the df_breast
dataset below, Class
, malignancy_degree
and avgerage_axillary_lymph_nodes
are categorical features instead of numerical features. My suggestion is to make a note in the function documentation or/and README.md file so that users can be aware of it. See the example below:
5) Some minor things:
missing_val
function:minimum
features instead of mininum
features) in pyb4model.py
Great work! I hope these suggestions help, it was fun trying out your package!
Thank you for the feedback! It was extremely hepful! Because of time constraints, we were unable to address all your feedback. Here is a list of the changes we did make:
Please find our new relase here: https://github.com/UBC-MDS/pyb4model/tree/v1.1.1
Thank you @marvinmin and @Margaret8521 for your valuable feedback. We have made the changes outlined above by @andrealee011 and @agdal1125. I just wanted to add one more comment on the feature_splitter
for the use of OHE. The intended use of the function was to help users visualize what kind of data they are dealing with, i.e. which ones are numerical and which ones are categorical. Kind of quick EDA for them. Not necessarily to do full data preprocessing. But we have changed its output into a data frame which is more presentable to users.
Submitting Author: Andrea Lee (@andrealee011 ), Jaekeun Lee (@agdal1125), Sakariya Aynashe (@eyrakas), Xinwen Wang (@xiw315 ) Package Name:
pyb4model
One-Line Description of Package: A python package to preprocess data and conduct machine learning Repository Link: pyb4model Version submitted: v1.1.0 Editor: Varada Kolhatkar (@kvarada) Reviewer 1: Cheng Min (@marvinmin) Reviewer 2: Ke Xin Zhao (@Margaret8521) Archive: TBDVersion accepted: TBD
Description
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here