UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: sktidy (Python) #14

Open anodaini opened 3 years ago

anodaini commented 3 years ago

Submitting Author: Jacob McFarlane @JacobMcFarlane, Asma Al-Odaini (@anodaini), Xudong Yang @xudongyang2, Heidi Ye @heidi-ye Package Name: sktidy One-Line Description of Package: Tidy model output for sklearn's LogisticRegression and KMeans Repository Link: sktidy Version submitted: 0.1.1 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

sktidy is a python package that returns a tidy summary output to sklearn LinearRegression and KMeans models using the functions tidy_lr() and tidy_kmeans(). It also outputs the predictions of the model for the original data using the functions augment_lr() and augment_kmeans() for LinearRegression and KMeans respectively.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

yaz-saleh commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5 hours


Review Comments

import sktidy.sktidy

as opposed to

import sktidy
# Importing packages
from sklearn.cluster import DBSCAN, KMeans
from sklearn import datasets
import pandas as pd
import sktidy
# Extracting data and training the clustering algorithm
df = datasets.load_iris(return_X_y = True, as_frame = True)[0]
kmeans_clusterer = KMeans()
kmeans_clusterer.fit(df)
# Getting the tidy df of cluster information
tidy_kmeans(model = kmeans_clusterer, X = df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-eaa8c2d1fa7b> in <module>
      9 kmeans_clusterer.fit(df)
     10 # Getting the tidy df of cluster information
---> 11 tidy_kmeans(model = kmeans_clusterer, X = df)

~/miniconda3/envs/563/lib/python3.8/site-packages/sktidy/sktidy.py in tidy_kmeans(model, X)
    155         # Getting the cluster center for the given each cluster, reshaping it \
    156         # so pandas behaves itself later
--> 157         cluster_center = model.cluster_centers_[cluster].reshape(
    158             1, cluster_labels.shape[0]
    159         )

ValueError: cannot reshape array of size 4 into shape (1,8)
====================================================== warnings summary ====================================================
../../../../../.cache/pypoetry/virtualenvs/sktidy-ENUwfNFi-py3.8/lib/python3.8/site-packages/patsy/constraint.py:13
  /home/yazan/.cache/pypoetry/virtualenvs/sktidy-ENUwfNFi-py3.8/lib/python3.8/site-packages/patsy/constraint.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
    from collections import Mapping

-- Docs: https://docs.pytest.org/en/stable/warnings.html
==================================================== 4 passed, 1 warning in 0.95s =========================================

Great work on the package overall. Could potentially see myself using it one day!

ChuckHo777 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2-3 hrs


Review Comments

Hi Asma, Heidi, Jacob, and Peter:

Great job on this package, it's a really good tool to prepare us for analysis on the linear regression and KMeans model. I would definitely use this tool.

I just had a few comments on the package.

Summary, Feature/Function Description and Function Docstring:

Badges

Usage Instruction & Example (on README & Read the Docs)

from sktidy import sktidy
ValueError                                Traceback (most recent call last)
<ipython-input-75-28772663a5ae> in <module>
      1 kmeans_clusterer = KMeans()
      2 kmeans_clusterer.fit(df)
----> 3 sk.tidy_kmeans(model = kmeans_clusterer, X = df)

~/Documents/mds/block5/524/DSCI_524_collab-sw-dev_students/sktidy/sktidy/sktidy.py in tidy_kmeans(model, X)
    155         # Getting the cluster center for the given each cluster, reshaping it \
    156         # so pandas behaves itself later
--> 157         cluster_center = model.cluster_centers_[cluster].reshape(
    158             1, cluster_labels.shape[0]
    159         )

ValueError: cannot reshape array of size 4 into shape (1,8)

Test Script

Potential Future Improvement

Overall, the package works well. I am looking forward to the future version.

Let me know if there's anything unclear.