Open scarlqq opened 2 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1hr.
1.2.9
, but I am not entirely sure if this is the version you want me to use. With this said, perhaps (4) you can also improve on the Usage part, and allow users to first download the virtual environment, then test the functions that you have written. Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1 hour
Congratulations Group 29 on publishing this amazing Python package! I saw a lot efforts demonstrated in your GitHub repository. Something I would like to point out to help you improve py_outliers_utils
even more:
In your docs/example.ipynb
, the output visualization for your last function visualize_outliers
is not rendered properly on GitHub, although it is nicely rendered in readthedocs.io
.
I found the direct link to your documentation on readthedocs.io
in Usage
section in your README.md
. However, it would better to include the badge of your documentation on readthedocs.io
in README.md
, as it is more visible in this way.
The docstrings for outlier_identifier
and trim_outliers
are not parsed properly on readthedocs.io
(i.e. the parameter space is not aligning). You probably missed some blank lines in between sections etc. It works nicely on visualize_outliers
though, so you can take a look at its docstring and try to fix these!
The code coverage is quite good. Well done! Would you consider to also include the codecov
badge in README.md
?
The source code of outlier_identifier
is pretty long and includes many branches. You definitely work hard on this function, and I love this function for its neat output and enabled customization. However, it might be a good idea to include more in-line comments explaining the job of some code and the overall flow, because right now it is not easy to read and understand.
Overall, I think it is a neat and useful package. Hopefully these comments can help you to make it better!
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5 hours
First of all congratulations on creating a wonderful and useful package. The team has done a great job and I really found the documentation, docstrings and examples to be very good. They easily guided me through the process of working with your package. I have added my recommendations below as minor changes I feel could make this already very good package slightly better:
The outliers.py function seems to be doing a lot of interesting work. The documentation and comments start of very well but when we reach the various conditions(line 68 onwards), the comments suddenly drop. Maybe including more comments there could be helpful.
The Altair plot seems to be missing a title, maybe adding one or giving the user the option to add a title string as a parameter could be good.
In the trim_outliers.py function, adding a method that would allow the user to input any single value they want to replace all the outliers with, could be useful.
Adding a few more plots, especially scatter plots of the variables plotted against each other with the identified outliers in a different color, could be of use.
Maybe adding the option to store the new outlier free database as a .csv file would be helpful to the user.
Overall a job well done, with excellent documentation. Thank you.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
The package includes all the following forms of documentation:
[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all user-facing functions
[x] Examples for all user-facing functions
[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
[x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a setup.py
file or elsewhere.
Readme requirements The package meets the readme requirements below:
[x] Package has a README.md file in the root directory.
The README should include, from top to bottom:
[x] The package name
[ ] Badges for continuous integration and test coverage, a repostatus.org badge, and any other badges. If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the badge for pyOpenSci peer-review will be provided upon acceptance.)
[x] Short description of goals of package, with descriptive links to all vignettes (rendered, i.e. readable, cf the documentation website section) unless the package is small and there’s only one vignette repeating the README.
[x] Installation instructions
[x] Any additional setup required (authentication tokens, etc)
[x] Brief demonstration usage
[x] Direction to more detailed documentation (e.g. your documentation files or website).
[x] If applicable, how the package compares to other similar packages and/or how it relates to other packages
[x] Citation information
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
[x] The documentation is easy to find and understand
[x] The need for the package is clear
[x] All functions have documentation and associated examples for use
[ ] The package has an obvious research application according to JOSS's definition in their submission requirements.
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
[ ] A short summary describing the high-level functionality of the software
[ ] Authors: A list of authors with their affiliations
[ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
[ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 1.5 hours
Congrats everyone on creating a very useful package! It is obvious you have put a lot of work onto it with great results. The documentation was also easy to follow and made it simply to install and use your package. I have noted a few things below which may be improved:
The alt air plot in the example ipynb does not seem to render. Its usage is certainly clear but it would be nice to see the example plot! It does render in readthedocs however, so this may not be a big issue.
Your ReadMe does not contain a code coverage badge, which would be nice to display since its obvious you put in a lot of work to write your tests and your coverage is quite good!
You docstrings are great, however the functions could probably use a few more comments to increase readability. Not a big issue but would be nicer to read to fully understand whats going on. For example, the outlier script contains many branches with if conditions, making it difficult to follow the paths.
It would be useful to have more customization tools for the plots, like allowing the user to provide axis labels, titles, colors etc. particularity for cases when users would want to share their visualizations to others. This would require significantly more work to build and test, so its understandable to keep things simple, but it's certainly an opportunity for improvement
Some analysis may find it useful to return the outliers to the user to keep track of observations being removed instead of discarding them.
When columns=None
in all functions, the functions by default are applied to every column, which is slightly counter intuitive since some would expect them to be applied to no columns. This is a simple semantic issue which you may disagree with, but it may be more clear to change the default from None to All, and take the None parameter to instead return no transformations.
In general, this is great work and I enjoyed using your package!
Submitting Author: Karanpreet Kaur (@karanpreetkaur) Linhan Cai (@lipcai) Qingqing Song (@scarlqq)
Package Name: py_outliers_utils One-Line Description of Package: Simple utility for dealing outliers. Repository Link: https://github.com/UBC-MDS/py_outliers_utils Version submitted: v1.2.9
Editor: Karanpreet Kaur (@karanpreetkaur) Linhan Cai (@lipcai) Qingqing Song (@scarlqq)
Reviewers: Wanying Ye (@GloriaWYY ) Amir Shojakhani (@Amirshoja ) Adam MORPHY (@adammorphy) Jacqueline Chong (@Jacq4nn)
Archive: TBD
Version accepted: TBD
Description
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
Explain how the and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here