Open johnwslee opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
get_data()
function like you have done for your other functions. This would make it easier for someone with less Python experience to understand the code in get_data()
show_summary_stat()
function is definitely useful and easy to use. Maybe something in the future that you could consider is giving the user the option to save the data frames in a suitable format as a png file. This is a personal preference and something that would just elevate your great functions to the next level.plot_hist_by_cond()
and plot_line_by_date
were simple, yet effective. Again, this is not necessary for your package but it would elevate these functions if you were able to provide the user with the option to save these figures as png files somewhere in the directory. Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5
plot_line_by_date
function is very well done and easy to use. A future improvement could be to allow an argument that plots the total cases in BC over time, instead of plotting separate lines for each region. The function would be even more robust if the user could specify certain conditions to be included in the graph, for example plotting only the number of cases of women over time, or the number of lab diagnosed cases over time.show_summary_stat
would be even more useful if it accepted an input argument that allowed the user to calculate the summary statistics of a specific subset of the data, such as only for a certain region or age group. Alternatively, the get_data
function could be altered instead such that it only retrieves a specified subset of data.plot_hist_by_cond
but on the y-axis in plot_line_by_date
. The standalone graphs are perfectly understandable, but it would be ideal to have consistency between the two graphs.Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5h
I like all the functions in this package and our group is actually written something similar. And this is something suggested by TA when our group is facing the same issue (as stated in the review of the R version of this package): when one function is working on downloading data, it is not suggested to use this function inside the other function to download data again. Instead, downloading data and saving it as a data frame, and then passing that data frame to do plotting is recommended. I think this might be because we don't want to download data multiple times, which takes space and time.
As stated in the review of the R version of this package, the plot_line_by_date function is a bit hard to understand with many lines, I am wondering if it is possible to use direct labelling. In that case, it is clearer for a user to understand which line represents which region.
As the previous reviewer suggests, the import of the Altair package seems confusing, as well as all the following:
alt.renderers.enable("html")
from altair import pipe, limit_rows, to_values
t = lambda data: pipe(data, limit_rows(max_rows=1000000), to_values)
alt.data_transformers.register('custom', t)
alt.data_transformers.enable('custom')
I tried to import this package only, no Altair and no all above, and all the functions seem to work fine for me (not a full test though). I understand some of them are for Altair to show larger data, but I am not sure about all others. So is it possible to add some documentation about why some lines are needed before using this package, and when would we need it?
Similar to the review of the R version of this package, when I test plot_line_by_date(1, 1), I was expecting to get an error message of Invalid argument type: startDate must be a string.
However, what I get is this: strptime() argument 1 must be str, not int
I think this is because this block:
try:
datetime.datetime.strptime(startDate, '%Y-%m-%d')
except ValueError:
raise ValueError("Incorrect date format, should be YYYY-MM-DD")
is before this:
if not(isinstance(startDate, str)):
raise TypeError('Invalid argument type: startDate must be a string.')
It tries to convert the date before checking if it is a character or not. So maybe the order should be reverted.
By the way, strptime() argument 1 must be str, not int
is actually a type error, so you may want to modify the code to:
try:
datetime.datetime.strptime(startDate, '%Y-%m-%d')
except TypeError:
raise ValueError("Incorrect date format, should be YYYY-MM-DD")
So again, for all the error testing, although all of the with pytest.raises(TypeError)
lines pass, it might not be the expected error message. I would recommend using this format to check the error messages, so that the previous problem could be noticed:
with raises(TypeError) as e:
plot_line_by_date(1, 1)
assert (
'Invalid argument type: startDate must be a string.' == str(e.value)
)
In the readme Usage section, show_summary_stat("2022-01-01", "2022-01-13")
shows the results vertically, but when I tried that it is actually a horizontal data frame. I would suggest to modify it to the actual code, e.g., show_summary_stat("2022-01-01", "2022-01-13").T
. By the way, the picture does not match either, it has a date of 2021-01-17 which is not in the specified range.
One tiny thing, I am a bit confused by the 'Out of Canada' data, it doesn't seem to be the worldwide data, so what are the criteria for that? I couldn't find any documentation for that. It could be great if there are some notes for that.
Additionally, I think the copyright holder in the license should be individual names, not MDS 2021 DSCI 524 Group 25.
Again, despite some trivial and nitpicky things and some updates needed for testing, I think this package is well done! The functions are pretty clear and useful.
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best practices. In general please consider:
Estimated hours spent reviewing: 2h
bccovideda
is a potentially helpful package on trending topics, which may come in handy when we want to do some basic data analysis on Covid data. Most of the issues and comments have been addressed by fellow reviewers, as I would try to not repeat what others have mentioned, my apologies if there is any overlapping. Here are some of my comments regarding this package:
get_data
tests, there are only assertion tests but I don't think I have seen tests on the parameters where it expects error with wrong parameters. There can be other possible reasons too, but just something I might look into if I were you to make sure the test coverage is good.get_data
function, I like this version more than the R one, to be honest, but it could be due to the differences in languages and platforms. I would probably do a similar implementation except for the flexibility and informative output that I mentioned in the R comments that may be considered as a useful upgrade.plot_hist_by_cond
function, I have the same concern that I had when I checked the R version of this function, not sure what is the purpose of separating mask
and temp
is two variables but okay. The part I would really change if I were you is figuring out how to grab the labels from the dataset instead of defining them as local variables in the function.plot_line_by_date
function, except for the redundant comments as in the R version of my comments, I still don't know the mystery of mask
and temp
, besides that everything else looks great. Honestly, the plots look much better than the R version (again, probably more of an Altair vs GGplot kind of thing) and this Python version seems more visually appealing to me.show_summary_stat
function, similar issues on the DRY principle as I mentioned in the R package comments, besides that, everything looks good.Overall, again, it is an interesting and helpful package. Thanks for the great work, group 25!
Submitting Author: Lianna Hovhannisyan @liannah, John Lee @johnwslee, Vadim Taskaev @vtaskaev1, Vanessa Yuen @imtvwy Package Name: bccovideda One-Line Description of Package: simple and scalable package for EDA of Covid19 cases in BC Repository Link: https://github.com/UBC-MDS/bccovideda Version submitted: v1.0.10 Editor: Lianna Hovhannisyan @liannah, John Lee @johnwslee, Vadim Taskaev @vtaskaev1, Vanessa Yuen @imtvwy
Reviewer 1: Simon Guo @y248guo Reviewer 2: Kiran Phaterpekar @kphaterp Reviewer 3: Rong Li @lirnish Reviewer 4: Jessie Wong @jessie14
Description
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here