Open anodaini opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
tidy_kmeans
- vignette and README example seems to throw an error.setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
tidy_kmeans()
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
Since the package is currently hosted on test.pypi.org, installations instructions should include the --extra-index-url
argument to allow for installation of dependencies.
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple sktidy
Based on the the package folder structure, I believe the import statements should say
import sktidy.sktidy
as opposed to
import sktidy
tidy_kmeans
current implementation seems to only work when the number of clusters in KMeans
is 2. If this is intentional, please update the documentation to reflect this. The folowing example in the README uses the default argument for n_clusters
which is 8.# Importing packages
from sklearn.cluster import DBSCAN, KMeans
from sklearn import datasets
import pandas as pd
import sktidy
# Extracting data and training the clustering algorithm
df = datasets.load_iris(return_X_y = True, as_frame = True)[0]
kmeans_clusterer = KMeans()
kmeans_clusterer.fit(df)
# Getting the tidy df of cluster information
tidy_kmeans(model = kmeans_clusterer, X = df)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-eaa8c2d1fa7b> in <module>
9 kmeans_clusterer.fit(df)
10 # Getting the tidy df of cluster information
---> 11 tidy_kmeans(model = kmeans_clusterer, X = df)
~/miniconda3/envs/563/lib/python3.8/site-packages/sktidy/sktidy.py in tidy_kmeans(model, X)
155 # Getting the cluster center for the given each cluster, reshaping it \
156 # so pandas behaves itself later
--> 157 cluster_center = model.cluster_centers_[cluster].reshape(
158 1, cluster_labels.shape[0]
159 )
ValueError: cannot reshape array of size 4 into shape (1,8)
GitHub repository does not seem to be connected to Codecov and the badge on README is not showing coverage.
This one might be out of your control, but the patsy
library version that your package uses has a deprecated import which is throwing an error during pytest
. I would check if there are newer versions of the package that could be used instead.
====================================================== warnings summary ====================================================
../../../../../.cache/pypoetry/virtualenvs/sktidy-ENUwfNFi-py3.8/lib/python3.8/site-packages/patsy/constraint.py:13
/home/yazan/.cache/pypoetry/virtualenvs/sktidy-ENUwfNFi-py3.8/lib/python3.8/site-packages/patsy/constraint.py:13: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
from collections import Mapping
-- Docs: https://docs.pytest.org/en/stable/warnings.html
==================================================== 4 passed, 1 warning in 0.95s =========================================
has an OSI approved license.
in your template aboveGreat work on the package overall. Could potentially see myself using it one day!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2-3 hrs
Hi Asma, Heidi, Jacob, and Peter:
Great job on this package, it's a really good tool to prepare us for analysis on the linear regression and KMeans model. I would definitely use this tool.
I just had a few comments on the package.
inertia
and silhoutte scores
for kmeans clustering seems to be no longer the output for the augment
function of KMeans, but it shows up in summary, feature description and docstring of tidy_kmeans()
. The team may want to update the summary, feature description and docstring of the functions.codecov
badge in GitHub repo is now showing status of unknown. It might due to typos in the hyperlink.It would be great if we have sample output for each function. We can consider breaking the 4 functions into 4 different code blocks and show their respective output to further help users to visualize the usage of the function.
To import the package, the instructed import sktidy
doesn't seem to work. The following command would work.
from sktidy import sktidy
ValueError Traceback (most recent call last)
<ipython-input-75-28772663a5ae> in <module>
1 kmeans_clusterer = KMeans()
2 kmeans_clusterer.fit(df)
----> 3 sk.tidy_kmeans(model = kmeans_clusterer, X = df)
~/Documents/mds/block5/524/DSCI_524_collab-sw-dev_students/sktidy/sktidy/sktidy.py in tidy_kmeans(model, X)
155 # Getting the cluster center for the given each cluster, reshaping it \
156 # so pandas behaves itself later
--> 157 cluster_center = model.cluster_centers_[cluster].reshape(
158 1, cluster_labels.shape[0]
159 )
ValueError: cannot reshape array of size 4 into shape (1,8)
I love how tidy_lr() eases off the hassle for extracting feature names and its coefficient value for feature importance analysis. It would be great if the function can further support other types of regression models
, and pipeline object
with steps such as pre-processing (e.g. CountVectorizer) or feature transformation/selection. The latter one has been a very tiresome task.
For augment_kmeans, it would be cool and even more convenient if we can augment the predicted cluster label for multiple KMeans Model with different hyperparameters (e.g. different n_clusters) so that we won't have different augmented dataset when we tried different hyperparameter for the k-mean models.
Overall, the package works well. I am looking forward to the future version.
Let me know if there's anything unclear.
Submitting Author: Jacob McFarlane @JacobMcFarlane, Asma Al-Odaini (@anodaini), Xudong Yang @xudongyang2, Heidi Ye @heidi-ye Package Name: sktidy One-Line Description of Package: Tidy model output for sklearn's LogisticRegression and KMeans Repository Link: sktidy Version submitted: 0.1.1 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Description
sktidy
is a python package that returns a tidy summary output tosklearn
LinearRegression and KMeans models using the functionstidy_lr()
andtidy_kmeans()
. It also outputs the predictions of the model for the original data using the functionsaugment_lr()
andaugment_kmeans()
for LinearRegression and KMeans respectively.Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here