UBC-MDS / software-review-2021

1 stars 1 forks source link

Submission: kmeaningfulR(R) #34

Open HazelJJJ opened 3 years ago

HazelJJJ commented 3 years ago

Submitting Author:

Repository: kmeaningfulR Version submitted: 0.2.0 Editor: Tiffany Timbers(@ttimbers ) Reviewers: TBD

Archive: TBD Version accepted: TBD


Package: kmeaningfulR
Title: Performs Kmeans Clustering
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Trevor",
             family = "Kinsey",
             role = c("aut", "cre"),
             email = "tkinsey@student.ubc.ca"),
      person(given = "Sasha",
             family = "Babicki",
             role = c("aut"),
             email = "sbabicki@student.ubc.ca"),
      person(given = "Mike",
             family = "Lynch",
             role = c("aut"),
             email = "mlynch@student.ubc.ca"),
      person(given = "Hazel",
             family = "Jiang",
             role = c("aut"),
             email = "yihongj@student.ubc.ca"))
Description: Uses the K-means algorithm to group similar data into clusters. Includes functions to preprocess and visualize the data.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Suggests: 
    testthat,
    covr,
    knitr,
    rmarkdown
Imports: 
    cluster,
    ggplot2,
    FactoMineR,
    forcats,
    dplyr,
    rlang
URL: https://github.com/UBC-MDS/kmeaningfulR
BugReports: https://github.com/UBC-MDS/kmeaningfulR/issues
VignetteBuilder: knitr

Scope

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

MEE Options - [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

ZIBOWANGKANGYU commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [] A short summary describing the high-level functionality of the software
  • [] Authors: A list of authors with their affiliations
  • [] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 4


Review Comments

Hello team:

Please see my review below. I really like the whole package and it is a heavy undertaking. You have put in a lot of effort!

Package API

Function names

I think in general, the names and usage cases of your functions are reasonable and user-friendly, However, I do think function names like fit and assign are a little bit too general and too common in the R ecosystem, see here and here. I suggest that you change them to fit_kmeans and assign_kmeans.

The function name show_clusters is a bit odd to me. If you show something, it is not necessarily a visualization (you can show things with a table). Therefore, I think the name can be changed to vis_clusters perhaps.

README

There is no need to comment out the part where you generate the plot. Your GitHub action should be able to knit README.md from README.rmd automatically.

I have also modified some formatting details in the README, as well as a case of inconsistency in the documentation of show_clusters function.

Code

preprocess function

The function handles edge cases really well. However, I think it would be better if we give users the option to only center, or only re-scale the data.

In addition, the function seems to quietly impute NA cases as 0, which is a little unexpected. I hope there is more obvious documentation or warning about that.

fit_assign function

The series of functions are really well-written and a lot of effort is visible.

show_clusters function

I think the imported package of forcats seems unnecessary. I removed the line locally and it works. Please double-check this.

Also, the plot title seems not very informative. I suggest that you indicate the USA of PCA in the visualization. The color legend seems not very necessary since we do not need to know the IDs of each cluster anyways.

Argument names

I find some of the argument names not very consistent across the package. For example, the same thing is called clusters in show_clusters but labels in avg_sil_score. The use of centroids in show_clusters and centers in fit_assign also should be more consistent.

Documentation and website

In general, the function documentations are very well written.

Formatting issues

However, some generated documents for functions like show_clusters, avg_sil_score and find_elbow are not very well formatted. See here, here and here

Organization of website

I think it is a bit difficult to find detailed documentation of functions on your website. They are now under the "Reference" tab, which is a bit confusing.

Testing

The tests are very well written, especially test_fit_design.R.

The line coverage score is about 90%, which can be further improved.

Dependencies

List of dependencies on README

The list of dependencies is not completely clear to potential users. For example, it says "R 4.0.3". Does that mean we can only use the 4.0.3 version of anything equal to or higher than this version? I suggest that you can probably use ">=" or "=" to clarify. like glue (>= 1.3.0). See here

Please let me know what you think.

Mark

danielon-5 commented 3 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Estimated hours spent reviewing: 2.5


Review Comments

Hi team, first of all congratulations on a very neat package. It definitely shows a lot of work put together.

Please find below a couple of comments:

So once again really good job! As you can see, I could only suggest minor details and had to be picky since your functions and code are really good and trying not to be repetitive with the great review comments from previous reviewer.

If you have any question or would like some further explanations, I'd be happy to help.

Daniel