Open stevenleung2018 opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing:
Feedback:
column_stats()
, seem it doesn't handle NA well.
file_path = "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv"
df = preprocess(file_path)
column_stats(df, c('Survived', 'Age'))
Error in quantile.default(data[[column]], 0.25) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Overall, I enjoy testing the package and I can see how it can benefit for EDA. =]
- Also, when I look up help in R-studio, the roxygen doesn't come up. Not sure if it's just my local issue.
Hi Macy, thanks for taking time in a detailed review!
I would like to follow up on the HTML page not found issued at RStudio under the "Help" tab. I am not able to reproduce the issue. Would you provide some steps on how it can be reproduced?
After installation, I just go to help tab and search the function name
After installation, I just go to help tab and search the function name
Did u load the package? 'library(EDAhelperR)'
I didn't, but I restarted the R studio and it works now =]
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: half an hour.
Great job team! Package successfully installed and all function are running smoothly on my end.
Here are a few things that I came up with:
When I tested the column_stat()
function with wrong input:
I thought the error message would be column not exists in the data frame or something like that? This error message is not informative from my point of view.
It would be nice to use the same dataframe to illustrate the function in the usage of README file, also include the desired output(screenshot, etc), which helps user better understand the function.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing:
preprocess
and you have all other functions with separate files which is confused. It might be better to be consistent, either all named with function name or put all functions into the EDAhelper file.readr::readr_example
part inside your function instead of asking users use your function in this way. It is not intuitive enough.preprocess(readr::readr_example("mtcars.csv”))
actually… T.T It. Should be preprocess("mtcars.csv")
column_stats
function, it fails when the column names are a string with spaces. You might need to handle this situation, either process it or return errors. Currently, the error is with these lines: error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent
numeric_plots
, it might be better to add columns
attributes to be consistent as your other three functions.Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 1hr
I find the output of column_stats() function a bit disorienting: the correlation and covariance matrices do not have appropriate titles for differentiation.
Some error messages are not informative and do not provide guidance on how to proceed. For example:
column_stats(mtcars, c("mpg", "cyl"))
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In rbind(summary_stats, new_row) :
number of columns of result is not a multiple of vector length (arg 2)
It would be nice if the columns can be selected to produce the plots in the numeric_plots() function.
It would be better to include the output plots in your usage section in README
Overall, this package looks pretty useful for EDA and I will be sure to incorporate it in my project!
name: EDAhelperR about: R package
Submitting Author Name: Name Submitting Author Github Handle: !--author1-->@[stevenleung2018](@stevenleung2018)<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) @suuuuperNOVA, @jennifer-hoang, @Rowansiv Repository: https://github.com/UBC-MDS/EDAhelperR Version submitted: Submission type: Standard Editor: TBD Reviewers: Irene Yan, Macy Chan, Yike Shi, Chaoran Wang
Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
This is a package for EDA (Exploratory Data Analysis). It therefore has elements of data retrieval, extraction, munging and some data visualization.
It is targeted for any data scientist who does Explanatory Data Analysis.
Surely, EDA is not a new topic to data scientists. There are quite a few packages doing similar work on CRAN. However, most of them only include limited functions like just providing descriptive statistics. Our proposal is more of a one-in-all toolkit for EDA. Below is a list of sister-projects.
brinton : A Graphical EDA Tool correlationfunnel : Speed Up Exploratory Data Analysis (EDA) with the Correlation Funnel ezEDA : Task Oriented Interface for Exploratory Data Analysis
These packages either focus a lot on graphical EDA or generating a large chart or report with a single large function. With EDAhelperR, we try to offer short, simple function calls which can be used in reports.
There are a few Python packages which also work on EDA, but they position themselves differently compared to EDAhelper. Some packages do only 1 function. Some packages provide functions which almost generate a comprehensive report. We want to provide a packages with individual function calls which can be used in different parts of a report.
Not applicable.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct