Open HazelJJJ opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5
Here are some of the points that I wanted to just point out. Hopefully it is of help to you guys!
Installation: I copied the pip3 install
code into my terminal however it came ran into an error. ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/Library/Python/3.8' Consider using the
--useroption or check the permissions.
--user
I was able to successfully install the package. I’m not too familiar with software end of things, so not sure if this is just a personal computer issue or not. If not, perhaps adding --user
to the end of the installation code chunk would help avoid this issue. I also noticed it seemed to be linked to the 0.2.0 version, which may not be the latest release.Pytest: After cloning the repo and running poetry install
and poetry run pytest
, I had 9 pass and 2 warnings. These are just warnings, but I wanted to bring them to your attention just in case. The warnings messages as follows:
-RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype,
-RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)
README Examples: I think the example in the README would benefit from showing the output of what each function does. I think seeing it laid out in such a nice short code chunk is nice, however it is harder at a glance for a user to figure what the output might be. For example, showing the output array on preprocess()
or the output plot on show_clusters()
would be a nice little touch.
Readthedocs: The readthedocs is good and I was able to follow the installation instructions for cloning the repo. The module index section is good too, but the parameters and returns section seem to have been messed up a bit in the formatting.
fit_assign()
:
I would consider hiding the helper functions for fit_assign()
. It is really neat and impressive function that has a lot of components, but it may be better to not include the helper functions into the readthedocs as the user probably doesn’t need to know about those. However, if those functions are intended to be made known to the public I think the formatting as it stands is a bit cluttered (most likely due to the automatic rendering of readthedocs).
To expand a bit on this, I thought the fit()
and assign()
functions within fit_assign()
were simply helper functions, but in the usage section those functions are called separately so it seems to indicate these are not simply helper functions. Personally, I think for simplicity purposes it would be easier and in-line with your package to only allow users to call fit_assign()
. However, I understand why you may choose otherwise.
Overall Comments
Great job team! I think your package is great. The README does a great job in giving a good overview of what the package does and the functions are tested, written defensively and run well. I really appreciated the thorough comments explaining the code. Overall, I think the package accomplishes the task you set out to do as it is simple to use and lightweight.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
The package meets the readme requirements below:
[x] Package has a README.md file in the root directory. The README should include, from top to bottom:
[x] The package name
[x] Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
[x] Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
[x] Installation instructions
[x] Any additional setup required (authentication tokens, etc)
[x] Brief demonstration usage
[x] Direction to more detailed documentation (e.g. your documentation files or website).
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages
[x] Citation information
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
For packages co-submitting to JOSS The package has an obvious research application according to JOSS's definition in their submission requirements. Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md matching JOSS's requirements with:
A short summary describing the high-level functionality of the software Authors: A list of authors with their affiliations A statement of need clearly stating problems the software is designed to solve and its target audience. References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 1.5
Hey folks! Excellent work putting this package together. I was able to install the package using the pip command without any issues. Please find some feedback on the package below:
README Examples: I would echo @micahkwok comments about including the output of the each function in the README vignette. It would really help to be able to visualize your package's functionality through photos or gifs. This is especially true for the output of the show_clusters
function.
Examples on readthedocs.io : The examples from readthedocs.io do not seem to be rendering properly on my end (i.e. the function from your package is on a separate line from the rest fo the examples). I am attaching a screenshot for your review.
find_elbow
function: I wonder if it would be helpful to have the find_elbow
function return an elbow/scree plot as well as the optimal K. This would be a functionality similar to the yellowbrick KElbowVisualizer. Sometimes detecting the elbow is subjective and the user may be in a better position to decide on the elbow based on the problem at hand. I do appreciate that the user can pass the optimal_K value of their choosing onto the fit_assign
function; it makes the package more flexible.
Visualization: I realized that the cross marks denoting cluster centers hide circular data points. I wonder if it may be useful to be able to visualize both the data point and the cluster centers when the two are superimposed through changing the opacity of the crosses.
Code coverage badge: Just thought I would flag that your code coverage badge is not dynamically updating.
Overall Comments Excellent work building a simple and lightweight implementation of k-means clustering from scratch. I think your package accomplishes the goals you had set for it. I thought the code was very well-written and the functions worked seamlessly. I especially appreciated that the preprocess function could handle multiple outputs(e.g. Pandas DataFrame or numpy array). Great work overall.
Submitting Author:
Package Name: Kmeaningful One-Line Description of Package: Python package that contains functions to help with data preprocessing, hyperparameter tuning and visualizing clusters using the k-means algorithm. Repository Link: Kmeaningful Version submitted: 0.3.1 Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Description
Have you ever encountered a dataset that seems to have different patterns in it? Have you ever tried to group similar things together in a dataset and to assign a new sample based on your findings?
Kmeaningful
is a python package that uses the k-means algorithm to find clusters and assign new data points to them.preprocess()
automatic dataset preprocess.fit_assign()
finds centroid location for all of the clusters and assigns each example to a clusterfind_elbow()
automatic hyperparameter tuning to select optional number of clustershow_clusters()
visualize clusters and centroid according to 2d PCA representationScope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here