UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: datascience.eda.R #42

Open lephanthuymai opened 3 years ago

lephanthuymai commented 3 years ago

Submitting Author: Mai Le (@lephanthuymai) Other Authors: Aditya Bhatraju(@adibns), Charles Suresh (@charlessuresh), Rahul Kuriyedath (@rahulkuriyedath) Repository: https://github.com/UBC-MDS/datascience.eda.R Version submitted: https://github.com/UBC-MDS/datascience.eda.R/releases/tag/0.3.0 Editor: TBD Reviewers: TBD

Archive: TBD Version accepted: TBD


Package: datascience.eda
Title: Common Functions for EDA
Version: 0.3.0
Authors@R: 
    c(person(given = "Mai",
           family = "Le",
           role = "aut",
           email = "lephanthuymai@gmail.com"),
    person(given = "Charles",
           family = "Suresh",
           role = c("aut", "cre"),
           email = "emailcharlesjosh@gmail.com"),
    person(given = "Aditya",
       family = "Bhatraju",
       role = "aut",
       email = "adityabns@outlook.com"),
    person(given = "Rahul",
      family = "Kuriyedath",
      role = "aut",
      email = "rahul.kuriyedath@gmail.com"))
Description: This package includes functions assisting data scientists with various common tasks during the exploratory data analysis stage of a data science project. Its functions will help the data scientist to do preliminary analysis on common column types like numeric columns, categorical columns and text columns; it will also conduct several experimental clusterings on the dataset.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Suggests: 
    testthat (>= 3.0.0),
    rmarkdown,
    covr
Config/testthat/edition: 2
Imports: 
    stats,
    dbscan,
    dplyr,
    imputeR,
    graphics,
    ggbiplot (>= 0.55),
    ggplot2,
    palmerpenguins,
    NLP,
    RColorBrewer,
    tidyverse,
    tm,
    wordcloud,
    vdiffr,
    stringr,
    sacred (>= 0.1.0),
    purrr,
    forcats,
    GGally,
    reshape2,
    tibble,
    MASS,
    knitr
Remotes: 
    git@github.com:JohnCoene/sacred.git,
    git@github.com:vqv/ggbiplot.git
VignetteBuilder: knitr

Scope

datascience.eda package provides functions to automate most of preliminary exploratory data analysis tasks (workflow automation), some of the highlight functionalities are creating data clusters to be used in feature engineering (data extraction), identifying the proportion of missing data for categorical columns (data testing), retrieving top words in text columns (text analysis).

The target audience of this package is data scientists, it will help to improve the efficiency of the EDA process

There are various packages providing functions to be used in EDA, most of them focus on identifying the anomalies of numeric columns, the exact functionalities vary from ours, furthermore, there is no EDA-related package in our awareness that provides functions to handle text columns and data clustering.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ttimbers commented 3 years ago

Assigning @wangjc640 & @jordanlau123 as reviewers.

jordanlau123 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2


Review Comments

Hi Mai, Aditya, Charles, and Rahul. datascience.eda.R looks great! I'm impressed by the amount of work you all put into this package in such a short period of time. Below are some of my comments and feedback regarding the package:

Overall, you all did a great job on this package, It was very well thought out and in-depth. This package definitely speeds up the EDA process and I can see other data scientists using it!