Open micahkwok opened 3 years ago
Assigning @williamxu7 & @tdkhanhvu as reviewers.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2
Hi Chuang, Fatime, Jiacheng, and Micah, the eda.utilsR package looks great, I am impressed by the work you have done! I included some general comments and specific comments below you may consider to make the package even greater.
Overall the package is easy to use, I tested your R packages on my computer and there is no issue with install. There is no error or warning which is amazing. Great job on 100% code coverage! You have done a great job on continuous integration services (GitHub Actions, Codecov, etc.). Your Readme and vignette documents are clear and provide great guidance for target users to understand and use your functions.
Code
Documentation
I enjoyed reading your readme file, the examples and correlation heatmap provided are well designed.
"Our Place in the R Ecosystem" section in the README is well written, it is useful to see the documentation links for several R packages with similar functionalities
I ran goodpractice::gp() to identify likely sources of errors and style issues. I noticed that in your .R files, there are many '=', it may be better to use '<-' for assignment instead of '='. R users and developers may find it easier to read code if you use '<-'.
I ran spelling::spell_check_package() as well as spelling::spell_check_files("README.Rmd"). Minor spelling errors in the readme file (such as words 'anslysis', 'preperation' in README.md can be corrected).
GitHub Repo Homepage
I enjoyed reviewing your package and you have done a great job. I think your target audience/users would enjoy using your package too.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2h
General comments
Hi team, I am impressed by the amount of work put into the projects by all team members that helps create a lot of useful features in this package.
The features are well divided into their separate functions, so users can know which function they need to use for their specific needs. As all of the necessary components such as Vignette website, documentation are also present, users can refer to them whenever they have any doubts.
However, I think your project can be even greater if you can address some of my feedback below:
Specific comments
Introduction
This package contains four functions: - cor_map: A function to plot a correlation matrix of numeric columns in the dataframe - outlier_identifier: A function to identify and deal with outliers - scale A function to scale numerical values in the dataset - imputer: A function to impute missing values
imputer example
outlier_identifier
This function could help you by eliminate ourlier points or impute ourlier values with mean or median
Sentence structure
Scaling is important for multiple machine learning tasks. Since imbalanced numeric scale in data frame will cause several features being ignored or amplified. Thus diminish the effectiveness of the machine learning model. This is where scaling function comes in handy. This function can help to re-scale all the numeric columns within a dataframe to balance the magnitude of each column.
Correlation map
Consistent test file names
test-cor_map.R
& test-scale.R
) vs (test.imputer.R
& test.outlier.R
). It is better to choose one style over the other to make it consistent.Some typos / wrong grammar
outlier.R
: "Defualt" in the Roxygen2 string, "has already process"...Nested if-else logic
outlier.R
: When the condition fails, the function will throw the exception and stop. So it is better to remove the nested if-else to make it easier to follow the logic. It seems that my suggestion is applied in cor_map.R
, so probably outlier.R
author can try to follow suit?Overall, I believe this is a great project and probably that due to the time constraint, there are a few minor mistakes made and it lacks some consistency. However, I think you all can easily address these drawbacks and can make it become even greater!
Submitting Author: Micah Kwok (@micahkwok), Fatime Selimi (@fatse), Chuang Wang (@chuangw46), Jiacheng Wang (@wangjc640)
Repository: https://github.com/UBC-MDS/eda.utilsR Version submitted: v0.3.0 Editor: Tiffany Timbers (@ttimbers) Reviewer 1: TBD Reviewer 2: TBD
Archive: TBD Version accepted: TBD
Description
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
This package includes functions to help with data manipulation asks such as scaling, imputation and plotting in the exploratory data analysis (EDA) portion of tasks, while keeping the code short and simple to use.
This package is targeted towards those in the data science field who have to do EDA as part of their project. This is ideally intended for datasets to be used in machine learning as it has functions that help with the preprocessing stage of work including scaling, imputation and identifying outliers. The scientific applications of this package is that users can easily call on these functions with one simple line of code, helpful for users that are unfamiliar with R and saving time overall.
Yes, there are other R packages that accomplish similar tasks such as scaler, ggcorr, mice, and OutlierDetection. However, these all exist in different packages while all these functions would probably be required during EDA. Furthermore, some of the functions in this package offer higher levels of customizability such as color scheme in the plot that is usually not easily available otherwise.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct