Group 03: nameformeR - R

name: nameformeR about: A helper python package that can be used to generate names. This could be used to come up with baby names, character names, pseudonyms, etc.

Submitting Author Name: Daniel Cairns, Eyre Hong, Bruce Wu, Zilong Yi Submitting Author Github Handle: @DanielCairns @eyrexh @BruceUBC @ZilongYi

Repository: https://github.com/UBC-MDS/nameformeR Version submitted: v1.1.0 Submission type: Standard Editor: Daniel Cairns (@DanielCairns), Eyre Hong (@eyrexh), Bruce Wu (@BruceUBC), Zilong Yi (@ZilongYi) Reviewers: @ranjitprakash1986, @Althrun-sun, @netsgnut, @kellywujy

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: nameformeR
Title: What the Package Does (One Line, Title Case)
Version: 1.1.0
Authors@R: c(person("Eyre", "Hong", role = c("aut", "cre"),
                     email = "eyrexh@students.cs.ubc.ca"),
              person("Daniel", "Cairns", role = "aut"),
              person("Bruce", "Wu", role = "aut"),
              person("Zilong", "Yi", role = "aut"))
Description: A helper python package that can be used to generate names based on the dateset.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1
Suggests: 
    covr,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports:
    comparator (>= 0.1.2),
    stringr (>= 1.5.0),
    dplyr (>= 1.0.10),
    readr (>= 2.1.3)
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [x] data retrieval
- [x] data extraction
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage.

Publication options

[ ] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[ ] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1hr

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Similar to the comment I provided for the python version, it would be nice to include an example for the find_old_name() function with sex as filtering, and make the input case insensitive.
I also found it could be more user-friendly to include parameter name in the function example. E.g find_name(sex = "F", init = "A", which provides hint for the users to input required values.
More explanation on how bar= parameter controls the output name would be nice to have.
I would be nice to also include sections on contribution and code of conduct in the README.md file.
It's a very interesting package. Some ideas for future development:
- allows finding names that start with a combination of letters
- is it possible to fetch the data for names beyond 2017?
- looks up names by its origin (e.g. biblical, other language, country of origin)

Great package!

peer_review_R_feedback1

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[ ] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 1

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The R package works seamlessly with the installation and the function working as documented.
The greate thing about the R console interface is that it provides suggestions on the function arguments, thereby it is easier to know which arguments are required. If you refer to my review for the python package, the python terminal doesn't provide such suggestions. Having said that it might still be a good idea to explain the mandatory arguments for the functions where necessary in the USAGE section. This will make it clear for the inexperienced users to comfortably execute the functions.
It will be helpful for the reader to understand the argument "bar" in the context of the find_unisex_name. I tried to understand it from the vignette. Perhaps adding a definitions section to the readme might help clarify the readers doubts. For example, my literal understanding is that a "neutral" name is equivalent to a unisex name. However, I am not able to identify if the same literal understanding if used in the design of the function. The current explanation for "bar" in the vignette doesn't help clarify this question.
The definition for "limit" and "bar" arguments appear to be interchanged in the vignette. Please see attached image.
Future development - In several parts of the world, naming of a newborn is often done based on several cultural factors. There are some niches, where people tend to name their newborn or go to the extremes of altering their own names, based on a claimed science call "numerology". I have not researched it enough to support or be against the practice. However, if the workings behind numerology can be incorporated and integrated in the name suggestions, the package will have a wide interest from a certain niche of users. Here is a link I found by a google search - https://www.babycenter.in/a1044919/baby-names-based-on-numerology.
Nice work on the package, I enjoyed experimenting with it and discovering new names from the era gone by!!

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[X] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
[X] Vignette(s): demonstrating major functionality that runs successfully locally
[X] Function Documentation: for all exported functions
[X] Examples: (that run successfully locally) for all exported functions
[X] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[ ] Performance: Any performance claims of the software been confirmed.
[X] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[X] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: ~ 1.5 hours

[X] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

(Review based on the latest commit on main as of writing, commit https://github.com/UBC-MDS/nameformeR/commit/50948086ff64a5e6c0a70364d0dd0f1793bfa701)

First, congratulations on the releasing of the package. It is a fun, simple (yet cool!) idea to use this dataset to suggest baby names.

What I like most is the code is clean and neat. I can see that other reviewers have already given excellent comments over the functionality aspect over the package. I would therefore would like to focus solely on the code-side of things instead.

(Some comments are similar to what I have made in the Python package, which can be viewed here.)

1. Consider DRYing your code, instead of Writing Every Time.

An example would be the snippet of loading the CSV dataset:

url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv"
data <- readr::read_csv(url,col_types = readr::cols())

This piece of code appears for 4 times. You may consider refactoring by, e.g., wrapping it as an utility function:

load_raw_data <- function () {
    url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv"
    readr::read_csv(url, col_types = readr::cols())
}

While the line count does not seem to decrease much, the real benefit is that when we want to, say, change the upstream dataset which may have a different structure, it would be easier.

2. find_unisex_name has a conflicting Roxygen definition.

The said function is defined as find_unisex_name <- function(bar, limit = 10), but provides the parameters as:

#' @param limit A float controls the minimum proportion of the neutral names in a single year in the dataframe.
#' @param bar The length of the output with a default value of 10.

It would be better to reorder the listed parameters accordingly, with documentation review.

3. find_old_name should have better guard.

I realized this when I tried out the sex parameter in it, but realized that this does not work with lower cases (e.g., 'f' or 'm'), unlike find_name. The offending piece of code is:

  df <- data |> dplyr::filter(tp == {{tp}})
  if (sex == "uni"){
    f <- df |> dplyr::filter(sex=="F") |> dplyr::pull(name)
    m <- df |> dplyr::filter(sex=="M") |> dplyr::pull(name)
    uni_df <- intersect(f,m)
    if (length(uni_df) < limit){
      uni_df
    }else{
      sample(uni_df,size=limit,replace = FALSE)
    }

  }else{
    r <- df |> dplyr::filter(sex=={{sex}}) |> dplyr::pull(name)
    sample(r,size=limit,replace = FALSE)
  }

Similar to the Python counterpart, it can be rewritten to simplify and with added guards. May refer to the Python issue for more details.

4. Consider enhancing your test cases, especially the conditional guards.

Looking for 100% line coverage for all projects is impractical, but it is always beneficial to cover edge cases. For example, currently the line coverage report shows that we are missing 10 lines, which are related to error handling cases (e.g., throwing exception when the type is not expected, out of range). It would be beneficial that this kind of behaviors to be documented in form of a working test case.

5. find_name and find_old_name have different sex definition, which can be confusing.

It might be best if the parameter can be unified. For example:

"M" or "m" for Male
"F" or "f" for Female
NULL for Unisex

Overall, I enjoy reading the code of your project, and despite hiccups, your documentation is good. Great work!

(Reviewed using rOpenSci review template

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[ ] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing:

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Your package is very distinctive, I like your theme very much, I think it will have broad application prospects.
The code is concise and easy to understand, very structured, with the characteristics of software engineering.
There are a lot of commits each time, and various details reflect the level of cooperation of your team.
The package is fully functional, with very distinctive methods and functions, decoupled and modularized.
You guys have made a great package that I think I will use someday!

UBC-MDS / software-review-2023