Open gptzjs opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2
Hi Group 15! Thanks for letting me review your package. I have some comments and suggestions below:
scaler()
, it would be nice to also have an option analogous to scikit-learn's MinMaxScaler()
onehot()
onehot()
. Although the example provided in the documentation runs properly, I ran into issues when trying to perform one-hot encoding on a data frame I created.
?onehot
to read the documentation, it says that the function takes in a data frame of categorical features.for (level in levels(encodable_df[[header]])) {
columns <- c(columns, level)
}
That's all for now, but please let me know if you require any clarifications.
Reiko
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 1.5 hrs
Nice job! You have detailed information about the target, installation, function features and usage in your README. It really helps me get familiar with your package immediately. I think your package would really be useful during the data preprocessing in machine learning! And I also have some suggestions for you. I hope that they can help you polish your package.
Though you include a vignette in the vignette
file, it would be better to give a link of the knitted version of vignettes in your README, so that people can approach and look through it more conveniently. Vignettes are very important since it demonstrates major functionality that runs successfully locally. People can follow the examples in the vignettes to learn how to use the functions and get familiar with your package more easily and quickly.
The link in the Documentation
part is broken. I don't know if it works on your computers. I tried several times, but it is still unable to work. I think it is quite a big issue you should look into immediately. Every time I use a new function, I would look at its documentation first. Without documentation, it would be really hard for me to use a function.
The usage included in your README is very detailed! I am wondering if it will be better you give a demo to each function in the README. Create a synthetic data, or use a dataset people are familiar with(such as Gapminder, mtcars), show the arguments passed into the function and the output of the function. With a demo, people would know what data is suitable for this function and what to expect after applying this function.
When I tried `install.packages("PrepR"), I got an error message as following: Then I tried the development version from GitHub, fortunately, it works!
There is a type error in your README. The One Hot Encode function should be onehot()
, while in the README, it is one-hot()
When I used the abalone data to test your functions, the data_type()
works out. However, the train_valid_test_split()
fails. It gives me an error message: "Please provide a non-empty data.frame object for x". At first, I think it fails since my data is empty. After I checked my data, I found out that the function only accepts dataframe. I think lots of our data in R is tibble, so only accepting dataframe will make your function less applicable and popular. Besides, I think you can also edit your error message and say directly, like "the input should only be a dataframe!". The same problem also exists in the onehot()
function. Since I passed a dataframe, I got an empty dataframe as a result. As I can not perform the train_valid_test_split()
successfully on my abalone data, I could not try the scaler()
function, because it wants train, test and validation dataframe at the same time.
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [ ] Vignette(s) demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions in R help
- [x] Examples for all exported functions in R Help that run successfully locally
- [x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Functionality
- [x] Installation: Installation succeeds as documented.
- [x] Functionality: Any functional claims of the software been confirmed.
- [ ] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Final approval (post-review)
- [ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 1.5 hrs
- [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
- Nice job! You have detailed information about the target, installation, function features and usage in your README. It really helps me get familiar with your package immediately. I think your package would really be useful during the data preprocessing in machine learning! And I also have some suggestions for you. I hope that they can help you polish your package.
- Though you include a vignette in the
vignette
file, it would be better to give a link of the knitted version of vignettes in your README, so that people can approach and look through it more conveniently. Vignettes are very important since it demonstrates major functionality that runs successfully locally. People can follow the examples in the vignettes to learn how to use the functions and get familiar with your package more easily and quickly.- The link in the
Documentation
part is broken. I don't know if it works on your computers. I tried several times, but it is still unable to work. I think it is quite a big issue you should look into immediately. Every time I use a new function, I would look at its documentation first. Without documentation, it would be really hard for me to use a function.- The usage included in your README is very detailed! I am wondering if it will be better you give a demo to each function in the README. Create a synthetic data, or use a dataset people are familiar with(such as Gapminder, mtcars), show the arguments passed into the function and the output of the function. With a demo, people would know what data is suitable for this function and what to expect after applying this function.
- When I tried `install.packages("PrepR"), I got an error message as following: Then I tried the development version from GitHub, fortunately, it works!
- There is a type error in your README. The One Hot Encode function should be
onehot()
, while in the README, it isone-hot()
- When I used the abalone data to test your functions, the
data_type()
works out. However, thetrain_valid_test_split()
fails. It gives me an error message: "Please provide a non-empty data.frame object for x". At first, I think it fails since my data is empty. After I checked my data, I found out that the function only accepts dataframe. I think lots of our data in R is tibble, so only accepting dataframe will make your function less applicable and popular. Besides, I think you can also edit your error message and say directly, like "the input should only be a dataframe!". The same problem also exists in theonehot()
function. Since I passed a dataframe, I got an empty dataframe as a result. As I can not perform thetrain_valid_test_split()
successfully on my abalone data, I could not try thescaler()
function, because it wants train, test and validation dataframe at the same time.
@zouwenjiao Thank you for your valuable feedback! Due to group member commitments and time constraints, we did not address all of your points but did act upon three of your suggestions:
onehot
and train_test_valid_split
for incorrect data type inputsonehot
so it can accept a dataframe and return the correct outputDocumentation
in README.md
Here is the link to the final release of PrepR
which includes this change:
https://github.com/UBC-MDS/PrepR/tree/v1.2.0
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [ ] Vignette(s) demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions in R help
- [x] Examples for all exported functions in R Help that run successfully locally
- [x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Functionality
- [x] Installation: Installation succeeds as documented.
- [ ] Functionality: Any functional claims of the software been confirmed.
- [ ] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Final approval (post-review)
- [ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 2
- [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
Hi Group 15! Thanks for letting me review your package. I have some comments and suggestions below:
- Just a small note: I would change the title of this issue to include the correct name of the package (i.e. PrepR)
Although the README contains most of the important information, I suggest adding some additional content:
- Outputs of the functions using pre-loaded data
- A link to a vignette that explains how the functions are to be used - I noticed that you included a Read the Docs link, but it seems to be broken. Perhaps you want to use vignette and pkgdown to setup a website for the package instead. In lecture 6, Tiffany has provided a helpful walk-through on how to do exactly this.
- Citation information - this should be included so that users know how to cite the package.
As for feature requests, here are some ideas that came to mind:
- With respect to
scaler()
, it would be nice to also have an option analogous to scikit-learn'sMinMaxScaler()
- On a related note, it would also be nice to have the option to drop the first column in
onehot()
With respect to functionality, I had a problem with
onehot()
. Although the example provided in the documentation runs properly, I ran into issues when trying to perform one-hot encoding on a data frame I created.
- When I call
?onehot
to read the documentation, it says that the function takes in a data frame of categorical features.- I decided to create a data frame with two columns of categorical data. Here's what happened when I tried calling the function:
- I tried debugging the function line by line and it appears to be lines 31-34 that throw the error:
for (level in levels(encodable_df[[header]])) { columns <- c(columns, level) }
That's all for now, but please let me know if you require any clarifications.
Reiko
@reikookamoto Thank you for your valuable feedback! Due to group member commitments and time constraints, we did not address all of your points but did implement several of them:
onehot()
so that it can take dataframes with multiple columns of categorical variables.onehot()
so the user can choose to drop the first columnHere is the link to the final release of PrepR
which includes this change:
https://github.com/UBC-MDS/PrepR/tree/v1.2.0
Submitting Author: @camadi @jasmineqyj @matthewconnell @gptzjs (Group 15)
Repository: https://github.com/UBC-MDS/PrepR Version submitted: v1.1.0 Editor: @kvarada Reviewer 1: @zouwenjiao Reviewer 2: @reikookamoto Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
train_valid_test_split
: This function splits the data set into train, validation, and test sets.data_type
: This function identifies data types for each column/feature. It returns one dataframe for each type of data.one-hot
: This function performs one-hot encoding on the categorical features and returns a dataframe for the train, test, validation sets with sensible column names.scaler
: This function performs standard scaling on the numerical features.Machine Learning Engineers, Data Scientists, students and any other person who is interested in preprocessing data before running machine learning models.
No single package does the four different functions of
preppy524
but there are some functions that does some part of thepreppy524
package.@tag
the editor you contacted:None
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
JOSS Options
- [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct