Open scarlqq opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 1hr
Error in
[.data.frame(df_cp, col) : undefined columns selected
when I keyed in trim_outliers(penguins2, columns = c('Culmen Length (mm)', 'Culmen Depth (mm)'))
[penguins2 <- palmerpenguins::penguins_raw] . The other two functions worked great. Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Congrats everyone on creating a very useful package! It is obvious you have put a lot of work onto it with great results. The documentation was also easy to follow and made it simply to install and use your package. I have noted a few things below:
The vignettes could contain more detail, particularly in the descriptions of the arguments and return values. For example, the type
argument in the visualization function could be farther described with the options users have for its value (violin, boxplot etc.)
Similarly, a more thorough description of the functions utility could be beneficial, as the current descriptions are sparse. The function descriptions are also repeated twice in the headers and sub headers, appearing cumbersome.
Not all authors are listed in the vignette, only Qingqing Song is listed on the right hand side of the home page. This section should contain all contributors.
Their are some minor grammar issues which could be corrected, for example, under How to use routlierutils
, library is spelled wrong.
Similar to your Python package, more comments could be helpful to improve readability of your code, as your current scripts contain very few comments.
Additionally, the style of your code contains inconsistent spacing between lines, with some scripts using no spaces, and others using many, which is difficult to read. For example, trim_outliers
contains a series of if-else conditions which have awkward spacing between them, making it difficult to identify which statements are sequential, while visualize_outliers
contains no spacing at all. Keeping this formatting consistent across scripts would be very helpful!
In general, this is great work and I enjoyed using your package!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 1 hour
Good Job Team on developing this nice and neat package! A few points that I think would help you improve it further:
Right now, the code of conduct is just a template generated from the package. It would be a good idea to customize it a bit and add items that the team agree on. (At least it should contain your contact method.)
While inspecting the project structure, I noticed there's a figure living in the following path r_outliers_utils/man/figures/README-pressure-1.png
. I am not sure what it is, but it seems that it is not related to your package. Would you consider removing it?
This is mentioned in my review for your Python package, but it is still one of my concerns for your R package - the source code of outlier_identifier
is rather complicated, and so including more in-line comments explaining the job of some code and the overall flow will definitely make it easier to read and understand.
I like your outlier_identifier
function because it is neat and provides many customization. When I tested its functionality - if return_df=FALSE
, it outputs a dataframe with the 6th and 7th row being lower_range
and upper_range
, respectively. At first these two rows did not make sense to me by themselves, and you do not provide explanation on what these are in your docstring. Although I finally understood what these two rows meant, I still think that explaining these in documentation will help to avoid confusion.
trim_outliers
takes in a parameter identifier=Z_score
; however, it does not actually use this parameter. No matter what random value I give to identifier
, it always outputs the same dataframe and does not throw an error. Therefore, I suggest to add exception handling and corresponding test to this parameter. Moreover, I wonder if you will be working on this function in the future to implement more features (such as what you have in your Python package).
visualize_outliers
outputs either a violin plot or a boxplot. It maps variable names to x-axis, but it also facets the graph by variable names. It might not be an issue when we only want to visualize the data distribution of less than 3 variables, but when the number goes up to 4 or 5, the generated plot has many unnecessary spaces in between and the actual boxes or violins are too narrow. Hence, for better visualization, my suggestion is that you may consider only doing facetting and leaving x-axis untouched (or vice versa).
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 1 Hour
A very clean and practical package. As mentioned in the review of the python package, I find your group perfect in describing the package, examples and docstrings. These make using the package very easy and guide me through the process of working with the package. Just a few points to make this package even better are:
ncol
is hardcoded. This could create issues when the number of variables goes higher.Overall, I once again congratulate your group on a very well implemented package. This package could definitely help in finding outliers, saving a lot of time.
Submitting Author Name: Qingqing Song Submitting Author Github Handle: @scarlqq Other Package Authors Github handles: @karanpreetkaur @lipcai
Repository: https://github.com/UBC-MDS/r_outliers_utils Version submitted: 1.0.0 Submission type: Standard
Editor: Karanpreet Kaur (@karanpreetkaur) Linhan Cai (@lipcai) Qingqing Song (@scarlqq)
Reviewers: Wanying Ye (@GloriaWYY ) Amir Shojakhani (@Amirshoja ) Adam MORPHY (@adammorphy) Jacqueline Chong (@Jacq4nn)
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct