Open yaz-saleh opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
Hi Team,
Overall, well done! Thanks for your fantastic work. Here are some suggestions that you may want to make to improve your package performance:
The installation was failed when I tried to install the package. I tried to fix the error and here is the possible solution: pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pymleda
When I tried to use the SupervisedData
by following the usage, it gave me this error: NameError: name 'SupervisedData' is not defined
. The way I solved it is adding from pymleda.pymleda import SupervisedData
on your usage and your documentation.
The hyperlink on the documentation about sklearn’s function documentation
does not work.
It would be good to show more specific examples under the usage part on the README.
It would be nice to add all author's names in the pyproject.toml file.
These are all minor pieces of advice that I would like to suggest. Good job! Good luck with your next block!
Best wishes, Tingyu
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2
Hello everyone, first of all, congratulations on creating such a structured and detailed package.
--extra-index-url https://pypi.org/simpl
statement for the versions to be installed correctly since this package is being hosted in test.PyPISupervisedData
and autoimpute_na()
. Nonetheless, I consider that some further explanation for the other two functions is missing for them to stand out.supervised_data = SupervisedData(df, x_cols = ['feature1', 'feature2'], y_cols = ['target'])
supervised_data = pymleda.SupervisedData(df, x_cols = ['feature1', 'feature2'], y_cols = ['target'])
dfscaling
function is only returning the numeric features, not the whole original data frame where the numeric features are scaled. From reading the documentation, my understanding is that the column number of the original data frame should be maintained. If I understood it incorrectly it would be great to specify this condition in the docstring.Overall, amazing work, I hope this review finds you well and that you have an amazing rest of your week, you deserve it. Sincerely, Santiago Rugeles Schoonewolff
Thank you @Tammy1128 and @ansarusc for your detailed reviews. We much appreciate your inputs! We've fixed the installation instructions in the Readme as per your feedback. We are unable to address all of your concerns at the given time since active development of the package is being halted with the end of DSCI-524 as per our team's discussion. We would bear in mind some of your suggestions for our future development work and try to incorporate the best practices :-)
Submitting Author:
Package Name: pymleda One-Line Description of Package: Python package that helps with preliminary eda for supervised machine learning tasks Repository Link: https://github.com/UBC-MDS/pymleda Version submitted: 0.2.5 Editor: Tiffany Timbers (@ttimbers) Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Description
SupervisedData
is a wrapper class that splits a pandas dataframe into train and test sets and further into X and y subsets based on a list of user-provided columns.--
dftype()
function will return the type of columns and variables for the input data frame. Furthermore, if there are non-numeric columns, it will return the unique values of non-numeric columns and their length. --autoimpute_na()
function to identify and impute missing values for different attributes in a given pandas dataframe. --dfscaling()
function to apply standard scaling to the numerical features in a pandas dataframe.Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences): The pymleda package is intended to help with EDA for supervised machine learning tasks. It helps with tasks such as exploring variable types and summary stats, imputing NAs, scaling and centering numerical columns, as well as splitting the data into training, test, X, and y subsets.
Who is the target audience and what are scientific applications of this package?
The target audience for the package are ML and data science practitioners who would like to do preliminary EDA and wrangling of their dataset prior to moving on to other tasks in their pipeline.
Are there other Python packages that accomplish the same thing? If so, how does yours differ? There are other existing packages such as
scikit-learn
andpandas
that contain some similar functionality. For example,pandas
provides users with separate functions such asisnull()
,isna()
, andnotna()
to detect missing values andfillna()
,interpolate()
to fill them. Ourpymleda
package intends to augment the existing functionality of these packages with the goal of increasing ease of use. For example, imputing and scaling (viaautoimpute_na()
anddfscaling()
) will automatically identify the columns to modify. Similarly,dftype()
will return a summary dataframes for numeric columns (containing output ofpandas
'sdescribe()
) and another for non-numeric columns (containing unique values).Supervised_Data
class provides convenience attributes for accessing train, test, x, and y portions of the dataset relieving the user from having to keep track of the different variables.If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here