Open lephanthuymai opened 3 years ago
Assigning @wangjc640 & @jordanlau123 as reviewers.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 2
Hi Mai, Aditya, Charles, and Rahul. datascience.eda.R
looks great! I'm impressed by the amount of work you all put into this package in such a short period of time. Below are some of my comments and feedback regarding the package:
The .R files were well documented and commented when necessary, which makes it very easy to follow and read
The README and instructions were written in a clear and simple manner, I understood how to use the functions and their respective purposes after the first read!
explore_numeric_columns
: When running this function on the penguins dataset, I only expected 7 plots as per the example in the README. There were 7 additional tables in the output, and 6 of them were identical. This gets a bit messy especially when I knit the file.
explore_numeric_columns
: The correlation heatmap is a great idea to visualize the correlation between numerical features, but there might be a bug in the plot. Referring back to the example in the README, the correlation between features shouldn't all be grey. (I got a different result using GGally::ggcorr)
explore_categorical_columns
: Not sure if this is possible, but It might be a good idea to include this step inside the function (for all categoricals) as opposed to making the user do it manually:
df <- data.frame(lapply(penguins[, c('species','island')], as.character),
stringsAsFactors=FALSE) %>% tibble()
Overall, you all did a great job on this package, It was very well thought out and in-depth. This package definitely speeds up the EDA process and I can see other data scientists using it!
Submitting Author: Mai Le (@lephanthuymai) Other Authors: Aditya Bhatraju(@adibns), Charles Suresh (@charlessuresh), Rahul Kuriyedath (@rahulkuriyedath) Repository: https://github.com/UBC-MDS/datascience.eda.R Version submitted: https://github.com/UBC-MDS/datascience.eda.R/releases/tag/0.3.0 Editor: TBD Reviewers: TBD
Archive: TBD Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
datascience.eda
package provides functions to automate most of preliminary exploratory data analysis tasks (workflow automation), some of the highlight functionalities are creating data clusters to be used in feature engineering (data extraction), identifying the proportion of missing data for categorical columns (data testing), retrieving top words in text columns (text analysis).The target audience of this package is data scientists, it will help to improve the efficiency of the EDA process
There are various packages providing functions to be used in EDA, most of them focus on identifying the anomalies of numeric columns, the exact functionalities vary from ours, furthermore, there is no EDA-related package in our awareness that provides functions to handle text columns and data clustering.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct