UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: aRidanalysis (R) #35

Open cmmclaug opened 3 years ago

cmmclaug commented 3 years ago

Submitting Authors:

Repository: https://github.com/UBC-MDS/aRidanalysis Version submitted: 0.3.0 Editor: Tiffany Timbers (@ttimbers) Reviewers: TBD

Archive: TBD Version accepted: TBD


DESCRIPTION:

Package: aRidanalysis
Title: DRY out your regression analysis!
Version: 0.3.0
Authors@R: 
    person(given = "Santiago",
           family = "Rugeles Schoonewolff",
           role = c("aut", "cre"),
           email = "nphaterp@ualberta.ca")
    person(given = "Daniel",
           family = "Ortiz Nunez",
           role = c("aut", "cre"),
           email = "daniel.ortiz.n@gmail.com")
    person(given = "Neel",
           family = "Phaterpekar",
           role = c("aut", "cre"),
           email = "nphaterp@ualberta.ca")
    person(given = "Craig",
           family = "McLaughlin",
           role = c("aut", "cre"),
           email = "cswag7@student.ubc.ca")
Description: The aRidanalysis package allows users to streamline their regression analysis by providing functions that return model objects with familiar sklearn interfaces. An EDA function and model functions for different regression analysis response types are provided to simplify the analysis and modelling process and reduce code duplication. These model objects will allow users to leverage the inference capabilities of the R ecosystem with the prediction pipelines familiar in Python.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Imports: 
    dplyr,
    tidyr,
    glmnet,
    palmerpenguins,
    ggplot2,
    GGally,
    tidyverse,
    MASS,
    broom,
    AER,
    tibble,
    rlang,
    stringr,
    magrittr
Suggests: 
    testthat (>= 3.0.0)
Config/testthat/edition: 3

Scope

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

MattTPin commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 1


Review Comments

xudong-Y commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing:


Review Comments

Hi group 8,

Great job on this package development. I find the package has a clear purpose and the functions are easy to use. Several comments:
cmmclaug commented 3 years ago

Thank you for your feedback @MattTPin!

I'll do my best to address your comments below:

Since the goal of the package is make all of the different functions more accessible maybe there could be a more intuitive way to input all of the information as arguments into the different functions. I.e., one argument could be used to input the df and then more could be used to specify the columns.

This was actually the design of our original interface! We decided to change back to the standard Sci-Kit Learn/R X,y dataframe interface as a way to standardize the process of using these packages. We will discuss and reconsider, but I think there is some value in having one consistent usage pattern.

Good job including other packages with similar functions. This makes the purpose of the function much clearer to anyone reviewing the package.

Thanks! It's certainly well covered ground so we had to try to make our package distinct.

For the packages which are used for visualization perhaps the addition of optional features that allowed for adjustment of some parameters would have been useful. For example, allowing users to input graph or axis titles or the use of different color pallets.

This is great feedback, and certainly something to add to the next iteration of improvements.

In the vignette/examples for the regression functions it would be easier to interpret all of the examples if each of the reported values were placed within a string (i.e. "Model score is ____")

This was also our original implementation idea for interpretability. In the end, we decided to mimic the Sci-Kit Learn output as close as possible to allow users the flexibility to modify their result strings as desired.

The documentation doesn't appear to contain detailed information about each of the different functions (i.e. does not contain information about the docstrings/ different arguments). Using the pckdown package it should be possible to generate docs with links to multiple pages (for each function) and full details about arguments.

I think an example of the function documentation you're looking for is here, but this might be telling us to make this information more accessible!

Thanks again for taking the time to review our package, and please let us know if there are any follow-up questions!

cmmclaug commented 3 years ago

Hi @xudongyang2, thank you for taking the time to review our package! I will do my best to address your comments below:

In aridanalysis.R, for function arid_eda, I was not able to run the example in the docstring arid_eda(house_prices, 'price', 'continuous, c('rooms', 'age','garage')). I believe continuous is missing a quote on the right and dataset house_prices is not loaded.

Thanks for catching this! We have created the following issue to address this: https://github.com/UBC-MDS/aRidanalysis/issues/65

I like the clear explanations for each function in the vignette. It's great that the introduction mentions that this package is designed for bridging the gap of R and Python users. I find the comparison between machine learning and statistical analysis a little bit unnecessary though.

This is good to know, we will discuss and determine how to best communicate with our audience!

The readme is clear and concise. Great job! I appreicate the inclusion of related packages.

Thanks! We think it was important to differentiate our package due to the relatively crowded regression package ecosystem.

I saw that in testthat folder, for tests of countreg, each testthat code block uses repetitive data generation code. It will be great if helper data could be used to make the script DRY.

This is definitely a good idea to reduce code duplication, we have added a new issue to track this: https://github.com/UBC-MDS/aRidanalysis/issues/66

The functions are well written. I'm impressed by how defensive each function is. This is awesome!

Thanks! We certainly wanted our package functions to be robust and clearly explain any erroneous inputs for ease of use.

Thank you again for your feedback. We have added the issues above to help make the improvements you have suggested!