Open hwilliams10 opened 4 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
The README is very readable - I liked seeing the author information in a table!
Under Usage, under the stat_summary
function and Arguments subheading, the description of samples
is listed as "the drawed samples". For grammatical correctness, may I recommend changing it to "the drawn samples"
The link to readthedocs in the README under Documentation wasn't working, but I was able to get to the page via the badge
I may have missed it, but I only saw credits for use of the cookiecutter template, and didn't see anything for Python and your dependencies - please let me know if I missed it, otherwise I would recommend this site for generating python citations: http://citebay.com/
In examples of test files, under examples, I would recommend not putting the example as a variable for ease of use, ie. change from samples = draw_samples(pop, 3, [5, 10, 15, 20])
to draw_samples(pop, 3, [5, 10, 15, 20])
- not sure if there is a reason for you putting these into a variable, so if there is, please disregard! This is applicable to draw_samples and generate_virtual_pop
Under Usage, I would recommend adding import numpy as np
as it is required for an input in generate_virtual_pop
Under Usage, there is inconsistency with function names - some places it is plot_sample_dist
and plot_sampling_dist
, and elsewhere it is plot_sample_hist
and plot_sampling_hist
, my original imports failed because of this when I was following the Example Usage Scenario
I quite liked the feature of the plotting functions that the graphs were automatically side by side - made it easy to read and compare
For the purposes of consistency, I would consider either making the inputs for generate_virtual_pop
and stat_summary
that are numpy objects both either numpy objects or strings (if you choose to make both strings, my comment about adding import numpy as np
to Usage is irrelevant)
Documentation was all very readable! Kudos to you guys on great work!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2 hours
It is super efficient to deploy such a great package do statistical inferencing compared to the traditional cumbersome way. Great job, guys! The development follows a standard python package creation process with detailed documentation, fully covered tests, and is successfully built and released. The package has a range of useful functions. Meanwhile, the naming is very intuitive. Overall the user experience is good. Below are detailed comments and some suggestions from different perspectives:
The installation process is very smooth and very fast. The installation instruction is very clear!
generate_virtual_pop(size, population_name, distribution_func, *para)
I highly recommend the developer integrate the import of numpy within the function so that users don’t need to do it again.
Also, the way of importing function from the package should be improved so that the example from README file should work.
The distribution_func
argument is not very user-friendly. Users have to type np.random
before the real distribution, and when we want to find the right name of the distribution we have to refer documentations of np.random
. Therefore, a better way to integrated numpy function with the function is expected.
draw_samples(pop, reps, n_s)
n_s
argument is list according to the documentation. A more general intake type is expected, e.g. int.plot_sampling_hist(samples)
stat_summary(pop, samples, parameter)
describe()
function. For users, it is cumbersome to specify the parameter using a list and it required a special format e.g. ['np.mean', 'np.std'].The function documentations are well-designed and succinct. However, one thing I want to bring to authors' attention is that documentation for function [stat_summary](https://samplingsimulatorpy.readthedocs.io/en/latest/source/samplingsimulatorpy.html#module-samplingsimulatorpy.stat_summary)
is wrongly formatted.
It is impressive to have 100% coverage. Well done! I only have some minor suggestion in terms of the brief explanation of each test.
test_draw_samples()
: The short introduction is missing.test_stat_summary
: It is a good practice to have a short introduction to the test function like the other test functions in your package. However, users might be a little bit confused about what the TypeError
error is, which argument the test is referring to. Therefore, a more explicit expression is expected. In CONTRIBUTORS file, it is nice to list contributors’ Github handles. It would be perfect, if the handle is linked.
I am excited to see the further development of the package! Once again, good job, guys!
Thank you @slbsolomon and @chuusan for your reviews and feedback! Please see our release v.2.0.0 for the updates we have made based on your recommendations.
The following changes have been made based on your feedback:
import numpy as np
to Example Usage Scenario generate_virtual_pop
and stat_summary
to all be numpy objects (by updating any strings to numpy objects) stat_summary
documentation @slbsolomon - all four of us have tested the Read the Docs link in the README and were able to access the documentation. I'm glad you were able to access through the badge - not sure why the link wasn't working for you but hopefully it is now. As for assigning a variable name to our example use of generate_virtual_pop
and draw_samples
, we had done this since our plotting functions rely on these outputs as inputs.
@chuusan - the choice of making the expected type of the n_s argument (which we have since updated to be named sample_size for clarity) a list was intentional as the function expects multiple integer values as input.
Thank you again for all your work on reviewing our package. We really appreciate your time and effort!
Submitting Author: Name (@hwilliams10, @lisebraaten, @tguo9, @YueJiangMDSV)
Package Name: samplingsimulatorpy One-Line Description of Package: Create, sample from, and view histograms from virtual populations. Repository Link: https://github.com/UBC-MDS/samplingsimulatorpy Version submitted: 1.1.0 Editor: @kvarada Reviewer 1: @chuusan Reviewer 2: @slbsolomon Archive: TBD
Version accepted: TBD
Description
samplingsimulatorpy
is a Python package that allows users to generate virtual populations which can be sampled from in order to compare and contrast sample vs sampling distributions for different sample sizes. The package also allows users to sample from the generated virtual population (or any other population), plot the distributions, and view summaries for the parameters of interest.Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
This package is intended to assist in teaching and/or learning basic statistical inference by allowing users to generate virtual populations to compare and contrast sampling vs sample distributions and parameters.
The target audience is instructors and/or students teaching or learning basic statistical inference.
To the best of our knowledge, there is currently no existing Python package with the specific functionality to create virtual populations and make the specific sample and sampling distributions described above. We do make use of many existing Python packages and expand on them to make very specific functions. These include:
scipy.stats
to get distribution functionsnp.random
to generate random samplesAltair to create plots
Python
pandas
already includes some summary statistics functions such as.describe()
, however our package will be more customizable. Our summary will only include the statistical parameters of interest and will provide a comparison between the sample, sampling, and true population parameters.@tag
the editor you contacted:N/A
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements): "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here