UBC-MDS / software-review-2023

DSCI 524
0 stars 0 forks source link

Group 03: nameforme - Py #20

Open eyrexh opened 1 year ago

eyrexh commented 1 year ago

Submitting Author: Daniel Cairns (@DanielCairns), Eyre Hong (@eyrexh), Bruce Wu (@BruceUBC), Zilong Yi (@ZilongYi) All current maintainers: (Daniel Cairns (@DanielCairns), Eyre Hong (@eyrexh), Bruce Wu (@BruceUBC), Zilong Yi (@ZilongYi)) Package Name: nameforme One-Line Description of Package: A helper python package that can be used to generate names. This could be used to come up with baby names, character names, pseudonyms, etc. Repository Link: https://github.com/UBC-MDS/nameforme Version submitted: v1.1.0 Editor: Daniel Cairns (@DanielCairns), Eyre Hong (@eyrexh), Bruce Wu (@BruceUBC), Zilong Yi (@ZilongYi)
Reviewer: @ranjitprakash1986, @Althrun-sun, @netsgnut, @kellywujy Archive: TBD
Version accepted: TBD Date accepted (month/day/year): TBD


Description

Scope

Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

Publication options

JOSS Checks - [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

Code of conduct

Please fill out our survey

P.S. *Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

kellywujy commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 1


Review Comments

  1. All functions are working without error. The parameters are well-design. The only place that I need to do some exploartion myself to understand the usage is that find_old_name() function is able to handle sex as a input according to the usage description, but the example given does not demonstrate how to use this filtering parameter. I found the sex= parameter in VScode’s function documentation hints, but it would be more user-friendly to explicitly demonstrate the usage of sex= parameter in the usage description.

  2. For the sex parameter, inputting f or m does not work for find_old_name() since it does not match the case of “F” and “M”. I suggest to make this case-insensitive so that it is consistent with find_name() function.

  3. In find_unisex_name() function, i suggest to add more description of the bar= parameter’s mechanism. It was a bit tricky to understand what that means and how it influences the output without a detailed description. The restricted range of bar value (0 to 1) could have been mentioned too.

  4. Would it be possible to print out more than 10 names at a time? Or simply a lookup function that shows all names matching the user's expected criteria?

  5. In general I found the documentation could be more elaborated to fully demonstrate how versatile this package could be.

Very practical and interesting package! The GitHub repo is well organized and followed best practices. Great work team!

ranjitprakash1986 commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2 hrs


Review Comments

find_old_name('M', limit = 5)

find_old_name('1990s', 'M', limit = 5)

Then I figured out the following is the correct way to use this function find_old_name('1990s', sex = 'M', limit = 5)

Perhaps this can be made clear in the documentation.

netsgnut commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: ~ 2 hours


Review Comments

(Review based on the latest commit on main as of writing, commit https://github.com/UBC-MDS/nameforme/commit/8cbb2d3b0d816da00cc5779c1ffde49395e7390e)

First, congratulations on the releasing of the package. It is a fun, simple (yet cool!) idea to use this dataset to suggest baby names.

What I like most is the code is clean and neat. I can see that other reviewers have already given excellent comments over the functionality aspect over the package. I would therefore would like to focus solely on the code-side of things instead.

(Some comments are similar to what I have made in the R package, which can be viewed here.)

1. Consider following PEP 8 and other Python styleguides.

I know that this is not strictly required, and the linting routine is not even in our CI/CD workflows, but a consistent code style can be much easier for others to contribute and build on. (We are lucky that in Python we have relatively consistent code style, unlike JavaScript.) For example, by using pycodeguide:

$ pycodestyle nameforme.py
nameforme.py:9:1: E302 expected 2 blank lines, found 1
nameforme.py:11:80: E501 line too long (88 > 79 characters)
nameforme.py:14:17: W291 trailing whitespace
nameforme.py:23:80: E501 line too long (84 > 79 characters)
nameforme.py:32:1: W293 blank line contains whitespace
nameforme.py:40:1: W293 blank line contains whitespace
nameforme.py:44:1: W293 blank line contains whitespace
nameforme.py:48:1: W293 blank line contains whitespace
nameforme.py:51:1: W293 blank line contains whitespace
nameforme.py:55:1: W293 blank line contains whitespace
nameforme.py:56:32: W291 trailing whitespace
nameforme.py:57:80: E501 line too long (115 > 79 characters)

[..snip..]

It might even be better to integrate this into CI/CD to trigger for every PR.

2. Consider DRYing your code, instead of Writing Every Time.

An example would be the snippet of loading the CSV dataset:

url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv"
raw_df = pd.read_csv(url)

This piece of code appears for 4 times in the nameforme.py function. You may consider refactoring by, e.g., wrapping it as an utility function:

def load_raw_data():
    # You may also do data wrangling here, e.g., dropping unnecessary columns
    url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv"
    return pd.read_csv(url)

While the line count does not seem to decrease much, the real benefit is that when we want to, say, change the upstream dataset which may have a different structure, it would be easier.

3. Consider enhancing your test cases, especially the conditional guards.

Looking for 100% line coverage for all projects is impractical, but it is always beneficial to cover edge cases. For example, currently the line coverage report shows that we are missing 7 lines, which are related to error handling cases (e.g., throwing exception when the type is not expected, out of range). It would be beneficial that this kind of behaviors to be documented in form of a working test case.

4. find_old_name should have better guard.

I realized this when I tried out the sex parameter in it, but realized that this does not work with lower cases (e.g., 'f' or 'm'), unlike find_name. The offending piece of code is:

    df = data[data["tp"]==tp]
    if sex == "uni":
        F = df[df["sex"] == 'F']["name"]
        M = df[df["sex"] == "M"]["name"]
        uni_df = list(set(F).intersection(set(M)))
        if len(uni_df) < 10:
            return uni_df
        else:
            return np.random.choice(uni_df, limit, replace=False).tolist()
    else: 
        df = df[df["sex"] == sex]["name"]
        return np.random.choice(df,limit,replace=False).tolist()

It could be simplified, rewritten, with an added guard as:

    df = data[data["tp"] == tp]
    if sex.upper() == "M" or sex.upper() == "F":
        df = df[df["sex"] == sex.upper()]
    elif sex != "uni":
        warnings.warn("Invalid sex specified, assuming unisex")
    return np.random.choice(df["name"],limit,replace=False).tolist()

This can allow the m and f be used, while warning the user if values other than "f", "m", "F", "M", "uni" be used.

5. find_name and find_old_name have different sex definition, which can be confusing.

It might be best if the parameter can be unified. For example:

Alternatively, it might even be better to use enums instead, so that the user can pass in a Gender.MALE, Gender.FEMALE, Gender.UNISEX as a parameter.

6. find_unisex_name has a conflicting docstring definition.

The said function is defined as def find_unisex_name(bar,limit=10), but provides the parameters as:

    limit : float
        A float controls the minimum proportion of the neutral names in a single year in the dataframe.
    count : integer
        The length of the output with a default value of 10

It would be better to use consistent variable naming, with documentation review in place.

7. Consider using ValueError, IndexError instead of generic Exception.

Python libraries provide a lot of specific type of exceptions for different occasions. For instance, this snippet from find_name would be better fitted to use ValueError:

    if not (sex == 'F' or sex == 'M' or sex == 'f' or sex == 'm'):
        raise Exception("sex should be either 'F'/'f' or 'M'/'m'.")

Overall, I enjoy reading the code of your project, and despite hiccups, your documentation is good. Great work!

(Reviewed using pyOpenSci review template

Althrun-sun commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing:


Review Comments

  1. Your package is very distinctive, I like your theme very much, I think it will have broad application prospects.
  2. The code is concise and easy to understand, very structured, with the characteristics of software engineering.
  3. There are a lot of commits each time, and various details reflect the level of cooperation of your team.
  4. The package is fully functional, with very distinctive methods and functions, decoupled and modularized.
  5. You guys have made a great package that I think I will use someday!