Open MrThomasPin opened 4 years ago
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
P.S. Have feedback/comments about our review process? Leave a comment here
Reviewer: Elliott
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
[ ] Installation: Installation succeeds as documented.
I received an error when trying to install. I think this is an issue with poetry, that it does not auto install dependencies.
Collecting pydatapeek
Downloading https://test-files.pythonhosted.org/packages/1e/27/5a49ffb2261be9541e88d0ae9e076862e2a8029d779a78812a5f210f850f/pydatapeek-0.1.9-py3-none-any.whl
ERROR: Could not find a version that satisfies the requirement altair_saver<0.2.0,>=0.1.0 (from pydatapeek) (from versions: none)
ERROR: No matching distribution found for altair_saver<0.2.0,>=0.1.0 (from pydatapeek)
Estimated hours spent reviewing: 4
---#### Review Comments
Altogether, great job on the project. I think there is many useful features contained in the package, and it is well implemented! I found the code and structure, well written, and well documented. I found very few points to improve, but if time allowed to fix there is three things worth noting:
Unused file: I think there is an unused file titled pbc.py in the test directory.
I could not understand the heatmap documentation. To be more specific I did not understand how the function would be used or interpreted from reading the docs. Looking closer at the visualization on the readme, there is some labels on the edge of the image but they were very hard to read. I might suggest a more involved example of how it could be used, with a written description on how to interpret the output.
Lastly, this seems to be an issue with all of the projects including my own, but the fact that you need to manually install all the dependencies before pip installing seems like an issue. I think pip packages should generally install dependencies automatically.
Thank you for your time.
Sincerely,
Elliott Ribner
Submitting Author: Thomas Pin @MrThomasPin Package Name: PyDataPeek One-Line Description of Package: Simple EDA for .csv or .xlsx documents Repository Link: Repo Link Version submitted:
Editor: @kvarada
Reviewer 1: Elliott Ribner @elliott-ribner Reviewer 2: Aman Kumar Garg @amank90 Archive: TBD
Version accepted: TBD
Description
PyDataPeek is a package that enables data scientists to efficiently generate a visual summary of a dataset. This package includes functions that show the size of the dataset, a visual summary of missing data, a sample of the dataset showing the data types as well as exploratory visualizations for quantitative and qualitative data.
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see this section of our guidebook.
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Who is the target audience and what are scientific applications of this package?
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
Several Python packages are available that support exploratory data analysis but none are specific to the targeted use cases here - a simple and technologically friendly way of summarizing data.
pandas
functionality to manipulate dataframes. Our package functionality overlaps with some functions such aspd.describe
which computes summary statistics for dataframes. The package differs in that it aims to offer summary statistics dependent on data type, including long form text data.@tag
the editor you contacted: