UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: Kmeaningful(Python) #33

Open HazelJJJ opened 3 years ago

HazelJJJ commented 3 years ago

Submitting Author:

Package Name: Kmeaningful One-Line Description of Package: Python package that contains functions to help with data preprocessing, hyperparameter tuning and visualizing clusters using the k-means algorithm. Repository Link: Kmeaningful Version submitted: 0.3.1 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD


Description

Have you ever encountered a dataset that seems to have different patterns in it? Have you ever tried to group similar things together in a dataset and to assign a new sample based on your findings? Kmeaningful is a python package that uses the k-means algorithm to find clusters and assign new data points to them.

Scope

* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

Editor and review templates can be found here

micahkwok commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1.5


Review Comments

Here are some of the points that I wanted to just point out. Hopefully it is of help to you guys!

Overall Comments

Great job team! I think your package is great. The README does a great job in giving a good overview of what the package does and the functions are tested, written defensively and run well. I really appreciated the thorough comments explaining the code. Overall, I think the package accomplishes the task you set out to do as it is simple to use and lightweight.

elabandari commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme requirements

The package meets the readme requirements below:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:

Functionality

For packages co-submitting to JOSS The package has an obvious research application according to JOSS's definition in their submission requirements. Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software Authors: A list of authors with their affiliations A statement of need clearly stating problems the software is designed to solve and its target audience. References: with DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

Estimated hours spent reviewing: 1.5

Review Comments

Hey folks! Excellent work putting this package together. I was able to install the package using the pip command without any issues. Please find some feedback on the package below:

  1. README Examples: I would echo @micahkwok comments about including the output of the each function in the README vignette. It would really help to be able to visualize your package's functionality through photos or gifs. This is especially true for the output of the show_clusters function.

  2. Examples on readthedocs.io : The examples from readthedocs.io do not seem to be rendering properly on my end (i.e. the function from your package is on a separate line from the rest fo the examples). I am attaching a screenshot for your review. readthedocs

  3. find_elbow function: I wonder if it would be helpful to have the find_elbow function return an elbow/scree plot as well as the optimal K. This would be a functionality similar to the yellowbrick KElbowVisualizer. Sometimes detecting the elbow is subjective and the user may be in a better position to decide on the elbow based on the problem at hand. I do appreciate that the user can pass the optimal_K value of their choosing onto the fit_assign function; it makes the package more flexible.

  4. Visualization: I realized that the cross marks denoting cluster centers hide circular data points. I wonder if it may be useful to be able to visualize both the data point and the cluster centers when the two are superimposed through changing the opacity of the crosses.

  5. Code coverage badge: Just thought I would flag that your code coverage badge is not dynamically updating.

Overall Comments 
Excellent work building a simple and lightweight implementation of k-means clustering from scratch. I think your package accomplishes the goals you had set for it. I thought the code was very well-written and the functions worked seamlessly. I especially appreciated that the preprocess function could handle multiple outputs(e.g. Pandas DataFrame or numpy array). Great work overall.