UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: instaeda (R) #17

Open roycezhou opened 3 years ago

roycezhou commented 3 years ago

Submitting Author:

Repository: instaeda_R Version submitted: v0.2.0 Editor: Tiffany Timbers (@ttimbers) Reviewers: TBD Archive: TBD Version accepted: TBD


Package: instaeda
Title: InstaEDA: Quick and Easy Way to Clean Data and Build Exploratory Data Analysis Plots
Version: 0.0.0.9000
Authors@R: 
    person(given = "Justin",
           family = "Fu",
           email = "jufu11@gmail.com",
           role = c("aut", "cre"))
    person(given = "Royce",
           family = "Zhou",
           email = "royce.siqizhou@gmail.com",
           role = c("aut", "cre"))
    person(given = "Selma",
           family = "Duric",
           email = "selma@duric.ca",
           role = c("aut", "cre"))
    person(given = "Zeliha",
           family = "Ural Merpez",
           email = "zmerpez@gmail.com",
           role = c("aut", "cre"))
Description: This idea came up as we have been building data projects for quite some time now in the UBC MDS program. We noticed that there are some repetitive activities that occur when we first explore the data. This project will help you take a given raw data set an conduct some data cleansing and plotting with a minimal amount of code.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Config/testthat/edition: 2
Imports: 
    ggplot2,
    dplyr,
    magrittr,
    testthat,
    palmerpenguins,
    stringr,
    ggthemes,
    utils,
    covr,
    imputeR,
    scales,
    tidyr,
    tidyselect,
    stats,
    rlang
URL: https://github.com/UBC-MDS/instaeda_R
BugReports: https://github.com/UBC-MDS/instaeda_R/issues

Scope

The package falls under data visualization and exploratory data analysis since it performs to explore the raw data with data checking, data cleaning, exploratory visualization including numerical correlation plotting and basic distribution plotting by datatype with a minimal amount of code.

The target audience of this package includes anyone who has the requirements to clean data and build exploratory data analysis plots. For instance, students with computer science and data science background might be the target audience. Besides, data scientists, data engineers, statisticians are possible target users as well.

There are some R packages to conduct exploratory data analysis (EDA) such as "SmartEDA" and "dlookr". However, our R package provides a different working flow and functionality. There are four main components of our R package and each of the functions has its innovation points. For example, our data checking function gives users a sense of the whole distribution summary of the raw data. Although there are some built-in functions to summarize the missing data, mean, standard deviation, etc, our function implements straightforward plotting including several bar charts to describe the data summary with metrics considering the numeric columns, factor columns, complete rows and missing observations.

NA

NA

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

zmerpez commented 3 years ago

That looks nice. Thanks for preparing, lets finalize after discussing the License.

jianructose commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: about 3h


Review Comments

I like the whole package of helping eda more accessible and investigating swiftly! Nice job instaeda team👍🏻

camharris22 commented 3 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing: 2h 30min


Review Comments

Overall very useful package and a serious time saver when it comes to conducting EDA on your data. Nice work team!

zmerpez commented 3 years ago

@camharris22 Thanks for your comments, I am glad to hear the suggestion, you have made for 1:length(), I could use those in future. Luckly, I am checking for the case lenght() < 1 before to have a meaningful number of columns given as input. I agree for the tests that divide_and_fill might cover more, I would look into this.