Open BruhatMusunuru opened 3 years ago
Assigning @charlessuresh & @jachang0628 as reviewers.
Briefly describe any working relationship you have (had) with the package authors. The package authors and I are classmates in the UBC MDS-V 2020-21 program
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work.
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 4
Dear Bruhat, Pan and Chun,
Congrats on completing the package! I can see how your package would be useful to many.
I was able to successfully install your package and run all your functions.
Here are some of my comments:
Imputation methods in imputation
: I think there are different functional claims made for this function regarding the available imputation methods:
fit_data
vs fill_data
in imputation
: I'm not very clear on the distinction between fit_data
and fill_data
. I could not find appropriate documentation that defines these two parameters
constant imputation method in imputation
: When I pass method='constant'
along with a value for parameter constant
, the function works fine. However, when I try to to use the constant imputation method without passing any value to parameter constant
, the function throws this error:
Error in x[[v]][thisvar] <- if (N > 1L) value[n + seq_len(nv)] else value : replacement has length zero
Maybe it makes sense to raise exception with an appropriate error message for this?
function scaler
: Passing a dataframe with a single row throws errors when using both scaler types:
Function call using scaler_type='standardization'
:
X_train <- data.frame('a' = 1, 'b' = 5)
X_test <- data.frame('a' = 1, 'b' = 5)
X_Valid <- data.frame('a' = 1, 'b' = 5)
scaled_df <- scaler(X_train, X_Valid, X_test, scaler_type='standardization')
Error:
Std. deviations could not be computed for: a, b
Function call using scaler_type='minmax'
:
scaled_df <- scaler(X_train, X_Valid, X_test, scaler_type='minmax')
Error:
No variation for for: a, bSTATS is longer than the extent of 'dim(x)[MARGIN]'STATS is longer than the extent of 'dim(x)[MARGIN]'STATS is longer than the extent of 'dim(x)[MARGIN]'STATS is longer than the extent of 'dim(x)[MARGIN]'
Maybe it makes sense to raise exception with an appropriate error message for dataframes passed with single row entries?
eda
function: As per the README and your package website, this function will: Separate data into train/test dataset. Looking at the function code and the returned values, I don't think the function currently does this.Great work on this package! It was my pleasure to write this review. Let me know if there are any questions.
Thanks, Charles
Submitting Author: Bruhat Musunuru (BruhatM) Other Authors: Pan Fan(pan1fan2), Chun Chieh(Jason) Chang (jachang0628) Repository: https://github.com/UBC-MDS/prepropy-r Version submitted: Editor: TBD Reviewers: TBD
Archive: TBD Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences): Our package contains functions to impute missing values and scale features. These most likely fall under Data munging. We also have a function to Visualize selected features in a dataframe which falls under Data munging and visualization.
Who is the target audience and what are scientific applications of this package? The target audience for our package is a beginner user who is trying out regression and wants to simplify the pre-processing before implementing regression models.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? There are similar packages like ggplot and caret that have similar functionality to our package. But our package streamlines the process for simplicity and beginner friendliness.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct