UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: eda.utilsR (R) #44

Open micahkwok opened 3 years ago

micahkwok commented 3 years ago

Submitting Author: Micah Kwok (@micahkwok), Fatime Selimi (@fatse), Chuang Wang (@chuangw46), Jiacheng Wang (@wangjc640)

Repository: https://github.com/UBC-MDS/eda.utilsR Version submitted: v0.3.0 Editor: Tiffany Timbers (@ttimbers) Reviewer 1: TBD Reviewer 2: TBD

Archive: TBD Version accepted: TBD


Description

Package: eda.utilsR
Title: R package containing utility functions for EDA in machine learning
Version: 0.0.0.9000
Authors@R: c(person("Chuang", "Wang", role = c("aut", "cre"),
                     email = "chuangw.sde@gmai.com"),
              person("Fatime", "Selimi", role = "aut", 
                     email = "fatimeseelimi@gmail.com"),
              person("Jiacheng", "Wang", role = "aut",
                     email = "jiachengw@gmail.com"),
              person("Micah", "Kwok", role = "ctb",
                     email = "micahk@gmail.com"))
Description: The package contains basic functions for imputation, scaling, 
    dealing with outliers and plotting correlation heatmap.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Imports: 
    tibble,
    dplyr,
    ggplot2,
    reshape2,
    stats,
    rlang
Suggests: 
    testthat (>= 3.0.0),
    knitr,
    rmarkdown,
    covr
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

This package includes functions to help with data manipulation asks such as scaling, imputation and plotting in the exploratory data analysis (EDA) portion of tasks, while keeping the code short and simple to use.

This package is targeted towards those in the data science field who have to do EDA as part of their project. This is ideally intended for datasets to be used in machine learning as it has functions that help with the preprocessing stage of work including scaling, imputation and identifying outliers. The scientific applications of this package is that users can easily call on these functions with one simple line of code, helpful for users that are unfamiliar with R and saving time overall.

Yes, there are other R packages that accomplish similar tasks such as scaler, ggcorr, mice, and OutlierDetection. However, these all exist in different packages while all these functions would probably be required during EDA. Furthermore, some of the functions in this package offer higher levels of customizability such as color scheme in the plot that is usually not easily available otherwise.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ttimbers commented 3 years ago

Assigning @williamxu7 & @tdkhanhvu as reviewers.

williamxu7 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2


Review Comments

Hi Chuang, Fatime, Jiacheng, and Micah, the eda.utilsR package looks great, I am impressed by the work you have done! I included some general comments and specific comments below you may consider to make the package even greater.

General Comments

Overall the package is easy to use, I tested your R packages on my computer and there is no issue with install. There is no error or warning which is amazing. Great job on 100% code coverage! You have done a great job on continuous integration services (GitHub Actions, Codecov, etc.). Your Readme and vignette documents are clear and provide great guidance for target users to understand and use your functions.

Specific Comments

Code

Documentation

GitHub Repo Homepage

I enjoyed reviewing your package and you have done a great job. I think your target audience/users would enjoy using your package too.

tdkhanhvu commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2h


Review Comments

General comments

Hi team, I am impressed by the amount of work put into the projects by all team members that helps create a lot of useful features in this package.

The features are well divided into their separate functions, so users can know which function they need to use for their specific needs. As all of the necessary components such as Vignette website, documentation are also present, users can refer to them whenever they have any doubts.

However, I think your project can be even greater if you can address some of my feedback below:

Specific comments

Vignette

Introduction

This package contains four functions: - cor_map: A function to plot a correlation matrix of numeric columns in the dataframe - outlier_identifier: A function to identify and deal with outliers - scale A function to scale numerical values in the dataset - imputer: A function to impute missing values

imputer example

outlier_identifier

This function could help you by eliminate ourlier points or impute ourlier values with mean or median

Sentence structure

Scaling is important for multiple machine learning tasks. Since imbalanced numeric scale in data frame will cause several features being ignored or amplified. Thus diminish the effectiveness of the machine learning model. This is where scaling function comes in handy. This function can help to re-scale all the numeric columns within a dataframe to balance the magnitude of each column.

Coding files

Correlation map

Consistent test file names

Some typos / wrong grammar

Nested if-else logic

Overall, I believe this is a great project and probably that due to the time constraint, there are a few minor mistakes made and it lacks some consistency. However, I think you all can easily address these drawbacks and can make it become even greater!