Open camadi opened 4 years ago
[Draft review - work in progress]
The package includes all the following forms of documentation:
pp.data_type(my_data)['num']
returns a data frame with numerical columns only from the original data frametext_columns = pp.data_type(my_data)['num']
would help with interpreting what these functions dopp.scaler
seems off in the description. There's a bunch of ":param" that shows up.setup.py
file or elsewhere.
CONTRIBUTORS.md
Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
sklearn
and if this code can be used within pipelines
ERROR: Could not find a version that satisfies the requirement pytest-cov<3.0.0,>=2.8.1 (from preppy524) (from versions: none)
ERROR: No matching distribution found for pytest-cov<3.0.0,>=2.8.1 (from preppy524)
- [x] **Functionality:** Any functional claims of the software been confirmed.
- Checked by cloning repository and trying functions
- No obvious errors
- [x] **Performance:** Any performance claims of the software been confirmed.
- [x] **Automated tests:** Tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- Might be worth it for future collaborators to document your tests better - one-line comment line should do here
- Test functions can also be named better (e.g., in [test_datatype.py](https://github.com/UBC-MDS/PrepPy/blob/master/tests/test_datatype.py), `test_datatype1` can be renamed to `test_categorical-data`, other functions also have test function names of "test1", "test2" etc.)
- Tests in `test_scaler.py` is just one big test. If something fails, it's would be unclear which test function failed since tests aren't broken out into unit tests.
- [x] **Continuous Integration:** Has continuous integration, such as Travis CI, AppVeyor, CircleCI, and/or others.
- Seems like release workflow is failing at check style.
- Try using `autopep8 --in-place <filename>` on your code to fix a lot of these issues
- [x] **Packaging guidelines**: The package conforms to the pyOpenSci [packaging guidelines](https://www.pyopensci.org/dev_guide/packaging/packaging_guide.html).
Estimated hours spent reviewing: 2.5
---
#### Review Comments
Overall, quite a useful package that resolves some of the pain points I had with `sklearn`. I would definitely download and use your package! I think some minor fixes in explaining the functionality and documentation of tests would make it more appealing for potential users to get introduced to do and start using your package. Good work!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
I think overall this package is well implemented and has a strong use case. It helps streamline a lot of annoying and repetitive processes when cleaning data. There were some small issues with the documentation, as well as error and edge-case based test coverage that could use improvement, but the functions themselves work very well. With some polish this is definitely a package I would consider using the in the future.
Hi James,
Thank you for your valuable feedbacks and we have addressed the following items:
We added more examples to the usage section in README instead of one line of code, which does not seem adequate for users just started using our package
We changed the name of the package from PrepPy
to preppy524
last week and instructions on README seemed to be outdated. They should be updated to the latest version now!
Comments on functionalities were added to our test cases. I agree with you that it's helpful to have clearly documented tests so that users know what went wrong
We will work on polishing our functions to make them more useful as well as implementing more edge-case based test coverages in the future
The most recent release could be found through this link.
Thanks,
Jasmine
Hi @moniquewong
Thank you for your valuable feedback! We have addressed the following items:
Installation by following instructions in the README: We have updated the README with the proper instructions. We had an issue with testPyPI and had to change the name of the function last minute.
Updated test documentation: each test now has a one-line comment about what it does, and the tests have more sensible names to make what is happening clearer if something fails.
Updated the README with more information and documentation on how to use the package.
The most recent release can be found here.
Thanks,
Matt
Submitting Authors: George Thio (@gptzjs), Matthew Connell (@matthewconnell), Jasmine Qin (@jasmineqyj), Chimaobi Amadi ( @camadi) Package Name: preppy524 One-Line Description of Package: A python package for data preprocessing for machine learning Repository Link: https://github.com/UBC-MDS/PrepPy Version submitted: v1.2.0 Editor: Varada Kolhatkar (@kvarada)
Reviewer 1: Monique Wong (@moniquewong)
Reviewer 2: Mengzhe Huang (@Jamesh4)
Archive: TBD
Version accepted: TBD
Description
preppy524 is a package for Python to help preprocessing in machine learning tasks. There are certain repetitive tasks that come up often when doing a machine learning project and this package aims to alleviate those chores. Some of the issues that come up regularly are: finding the types of each column in a dataframe, splitting the data (whether into train/test sets or train/test/validation sets, one-hot encoding, and scaling features. This package will help with all of those tasks.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
train_valid_test_split
: This function splits the data set into train, validation, and test sets.data_type
: This function identifies data types for each column/feature. It returns one dataframe for each type of data.one-hot
: This function performs one-hot encoding on the categorical features and returns a dataframe for the train, test, validation sets with sensible column names.scaler
: This function performs standard scaling on the numerical features.Machine Learning Engineers, Data Scientists, students and any other person who is interested in preprocessing data before running machine learning models.
No single package does the four different functions of
preppy524
but there are some functions that does some part of thepreppy524
package.@tag
the editor you contacted:None
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
No
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here