UBC-MDS / software-review

MDS Software Peer Review of MDS-created packages
1 stars 0 forks source link

Submission: edar (R) #37

Open sweber15 opened 4 years ago

sweber15 commented 4 years ago

Submitting Author: Group 4: Sarah Weber (@sweber15 ) , Cheng (Marvin) Min (@marvinmin), Yi (James) Liu (@v5y8 ), Gaurav Sinha (@sgauravm ) Repository: https://github.com/UBC-MDS/edar/tree/v1.1.0 Version submitted: v1.1.0 Editor: Varada Kolhatkar (@kvarada ) Reviewer 1: Yu Fang (@lori94 ) Reviewer 2: Andrea Lee (@andrealee011 ) Archive: TBD
Version accepted: TBD


Package: edar
Title: EDA
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Cheng",
           family = "Min",
           role = c("aut", "cre", "ctb", "cph"),
           email = "marvin.cmin@gmail.com"),
       person(given = "Yi",
           family = "Liu",
           role = c("aut", "ctb", "cph")),
        person(given = "Gaurav",
           family = "Sinha",
           role = c("aut", "ctb", "cph")),
        person(given = "Sarah",
           family = "Weber",
           role = c("aut", "ctb", "cph")))
Description: Conduct initial EDA for exploring data in a dataframe.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.0.2
Suggests: 
    testthat (>= 2.1.0),
    covr,
    knitr,
    rmarkdown
Imports: 
    rlang,
    vctrs,
    lifecycle,
    pillar,
    tidyr,
    purrr,
    tibble,
    dplyr,
    ggplot2,
    ggcorrplot
URL: https://github.com/UBC-MDS/edar
BugReports: https://github.com/UBC-MDS/edar/issues
VignetteBuilder: knitr

Scope

This package provides a single function which generates a general exploratory data analysis report. The package simplifies EDA tasks that require a lot of coding effort like describing the data, knowing NA values and plotting the distributions of the variables which are needed to understand the data well.

The target audience for our package includes anyone that wants to understand a data set specifically including data scientists and data analysts. The scientific applications of this package are that users can perform a simplified EDA on a dataframe without the intense coding.

There are other similar packages which can be used for EDA analysis. A package which does a similar thing is DataExplorer. Our package looks specifically at the EDA needed for data science focusing on NA values, the distribution of categorical variables, summary statistics for numeric variables and the correlation in a dataframe.

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

JOSS Options - [ ] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)
MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

andrealee011 commented 4 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 1


Review Comments

Suggestions:

lori94 commented 4 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 2hrs


Review Comments

I got the following error when running R CMD check:

checking if this is a source package ... ERROR Only source packages can be checked.

Also goodpractice::gp() suggests It is good practice to

✖ fix this R CMD check ERROR: Only source packages can be checked.

as well as the following warnings:

Warning messages: 1: In MYPREPS[[prep]](state, quiet = quiet) : Prep step for test coverage failed. 2: In MYPREPS[[prep]](state, quiet = quiet) : Prep step for cyclomatic complexity failed. Warning in file(con, "r") : cannot open file 'man': No such file or directory ERROR computing Rd index failed:cannot open the connection

When I ran devtools::test():

devtools::test() No tests: no files in /Library/Frameworks/R.framework/Versions/3.6/Resources/library/edar/tests/testthat match '^test.*.[rR]$'

When I run spelling::spell_check_package(), I got the following suggestion for the spelling, there might be a minor typo just for your reference.

spelling::spell_check_package() DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'. WORD FOUND IN EDA title:1 description:1

Some suggestion after I playing around the functions with the flights dataset from nycflights13 package:

  1. For the describe_num_var() function, it is actually not necessary for the user to specify all the numerical columns(especially when the dataset is large), they can just pass the numerical columns that they want to explore. So this can be clarified in the user guide.

  2. Plots from the describe function are very helpful but the x-axis labels on the plot that generated from describe_cat_var are overlapped when there are many different categorical variables.

  3. The describe_na_values()function worked perfectly on small data set and users can see which data is missing but when dealing with a relatively large dataset, there were so long outputs full of 0s and 1s, which makes it hard for users to find the NA just by eyeballing. So maybe provide the index of the missing value will be better.

sgauravm commented 4 years ago

All the following changes can be found in the new release v1.2.0.