UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: aridanalysis (Python) #36

Open cmmclaug opened 3 years ago

cmmclaug commented 3 years ago

Submitting Authors:

Package Name: aridanalysis One-Line Description of Package: DRY out your regression analysis! Repository Link: https://github.com/UBC-MDS/aridanalysis_py Version submitted: 0.4.2 Editor: Tiffany Timbers (@ttimbers) Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

As Data Scientists, being able to perform Exploratory Data Analysis as well as Regression Analysis are paramount to the process of analyzing trends in data. Moreover, following the DRY (Do Not Repeat Yourself) principle is regarded as a majority priority for maximizing code quality. Yet, often times Data Scientists facing these tasks will start the entire process from scratch, wasting both time and effort while compromising code quality. The aridanalysis package strives to remedy this problem by giving users an easy-to-implement EDA function alongside 3 robust statistical tests that will simplify these analytical processes and produce an easy to read interpretation of the input data. Users will no longer have to write many lines of code to explore their data effectively.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

wiwang commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2 hours


Review Comments

Hi aridanalysis team,

Congratulations on completing your package. I think the team did a great job there and I really like the idea behind your package and how this can make data scientist life easy with this package. I enjoyed reviewing your work. Below you can find my suggestions which I hope they maybe helpful for you guys:

Some suggestions I found in the package:

Congrats again and well done.

Zhiyong

yhchen20 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5h


Review Comments

Hi aridanalysis team,

Congratulations on completing your package! I really enjoy reviewing the package. Below you can see some suggestions which I hope will be useful for further improvement:

cmmclaug commented 3 years ago

Hi @wiwang, thank you for taking the time to help us review our package! I will attempt to address your comments below:

The package functions are elaborate and easy to understand, in particular I love to see your team's motivation to create this package

I'm glad you found our descriptions and motivation clear! It was certainly important to clearly define our packages role within the Python regression ecosystem!

It is always good to see related packages with yours, so I can see the difference, and pros and cons.

Thanks! Again with a crowded Python regression package ecosystem we thought it was important to delineate our package niche within the landscape.

I don't see a docs when I click the docs badge into the read the Docs website

Great catch! This is an important one, I've created an issue to get to the bottom of this: https://github.com/UBC-MDS/aridanalysis_py/issues/89

In the usage section, actually I can not run through all sample codes after I install this package:

  • maybe you need to call functions like aa.aridanalysis.arid_eda()
  • house_prices data set are not imported
  • maybe it is better to show the output of all sample function executions like: display the EDA plots and show the regression results

Thanks for helping iron out these discrepancies. You're right, it looks like from the other reviewer we need to update our vignette import statement! Issue created to resolve this here: https://github.com/UBC-MDS/aridanalysis_py/issues/90. We should also add a line to import the house_prices dataset, I've appended to that same issue.

the comments in tests are not as detailed as source code

This is a good point and something to consider as a future improvement to our test suite!

Thanks again for helping us improve aridanalysis_py! If you have any further questions about our package, don't be shy!

cmmclaug commented 3 years ago

Hello @yhchen20, thank you for reviewing our package! I'll try to respond to your insights below:

I follow the instruction in the README.md to install the package and run the examples first, but the function reported an error that "module 'aridanalysis' has no attribute 'arid_eda'". When we use the pacakge, we should use from aridanalysis import aridanalysis as aa rather than import aridanalysis as aa. It might be better if you can clear it in the usage instruction

Agreed! We have an issue here to fix this: https://github.com/UBC-MDS/aridanalysis_py/issues/90

It might be better if you can test the workflow in usage: in usage session in README.md, there is a single quote is missing in line 5, a blank is redundant in the last df, and some inputs should be a string rather than a variable. Also, it might be better if you can put the expected output in usage session.

Thanks for including such detailed analysis! I've added the vignette suggestion fixes to the previous issue!

When I use the example to test arid_linreg function, an error was reported: name 'y' is not defined. The error is because we should input a list of str here. It might be better if you can use a correct example to replace the old one in usage and add proper error message to indicate this type error.

The vignette seems to be out of date! We're going to combine all of the vignette resolution fixes into https://github.com/UBC-MDS/aridanalysis_py/issues/90.

Great job with setting up so many tests and have a coverage rate higher than 90%! However, it might be better to add more tests of input test in the main function. If the input is incorrect, the function could stop early.

Maybe some clarification on this comment is necessary. Are you suggesting adding more defensive checks to the functions?

It might be better if the output plot could show automatically - the current output is alt.HConcatChart(...)

We will have to discuss this to get to the bottom of the reasoning behind this; this is certainly a valid critique!

Thank you again for your efforts in providing feedback. We will use your input to help polish our package!