Open rbouwer opened 8 months ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing:
Overall the package is super useful and very well documented. It is easy to follow along and really enjoyed going through the detailed vignette. The package has nice visualisations and the readme is very nice and well-written. There are some points below for your consideration:
===================================================================================== short test summary info =====================================================================================
FAILED tests/test_plot_binary.py::test_figure - _tkinter.TclError: Can't find a usable init.tcl in the following directories:
FAILED tests/test_plot_time_series.py::test_output_path_generation - _tkinter.TclError: Can't find a usable tk.tcl in the following directories:
================================================================================== 2 failed, 59 passed in 31.84s ==================================================================================
2. The pyxplor.py can be deleted since it does not contain anything
3. Break the Main Vignette, into smaller ones, could have a quick look, basic usage section. A longer in depth vignette going into the lengthy explorations that are possible with the package (separated) under a separate tab. Albeit the current breakdown is definitely helpful.
4. For the categorical bar plots, they could be ordered by largest to smallest to make it easier to visualise the categories. They are ordered in the facetted plots, only the first plot in the docs for categorical is not.
5. The badges are not all all there, could consider all adding the 2 missing ones ` Continuous integration and test coverage,` and `Python versions supported`
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
The package is well organized and very convenient. I really enjoy using it to automate and streamline the exploratory data analysis process. The documentation has explained the use of this package well and provides excellent examples for each core function. Some aspects to consider for improvement are:
Tests are passed in my end, good job!
$ pytest tests/
========================================= test session starts ==========================================
platform darwin -- Python 3.11.7, pytest-7.4.4, pluggy-1.3.0
rootdir: /Users/wenweiwu/Desktop/UBC_MDS/block4/DSCI_524/PyXplor
plugins: cov-4.1.0, anyio-4.2.0
collected 61 items
tests/test_plot_binary.py .................. [ 29%]
tests/test_plot_categorical.py ............. [ 50%]
tests/test_plot_numeric.py .............. [ 73%]
tests/test_plot_time_series.py ................ [100%]
========================================= 61 passed in 13.66s ==========================================
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1h
Overall, I think this package is really helpful in reducing the workload in EDA. The example usage is very comprehensive and clear, enjoy reading through, good job!
Below are the feedback:
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 1.5h
(pyxplor) Marcony1@MSI ~/OneDrive - Fundacion Universidad de las Americas Puebla/Documents/MDS/Block 4/DSCI 524/PyXplor (main)
$ ls
CHANGELOG.md CONDUCT.md CONTRIBUTING.md docs/ img/ LICENSE poetry.lock pyproject.toml README.md src/ tests/
(pyxplor) Marcony1@MSI ~/OneDrive - Fundacion Universidad de las Americas Puebla/Documents/MDS/Block 4/DSCI 524/PyXplor (main)
$ pytest tests/
bash: pytest: command not found
I had to pip install pytest and then I got this:
pip install pytest
=============================================== short test summary info ===============================================
ERROR tests/test_plot_binary.py
ERROR tests/test_plot_categorical.py
ERROR tests/test_plot_numeric.py
ERROR tests/test_plot_time_series.py
ERROR tests/test_pyxplor.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 5 errors in 0.41s ==================================================
(pyxplor) Marcony1@MSI ~/OneDrive - Fundacion Universidad de las Americas Puebla/Documents/MDS/Block 4/DSCI 524/PyXplor (main)
$ pytest tests/ --cov
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --cov
inifile: None
rootdir: C:\Users\Marcony1\OneDrive - Fundacion Universidad de las Americas Puebla\Documents\MDS\Block 4\DSCI 524\PyXplor
pip install pytest-cov
---------- coverage: platform win32, python 3.11.7-final-0 -----------
Name Stmts Miss Cover
----------------------------------------------------
tests\test_plot_binary.py 94 92 2%
tests\test_plot_categorical.py 70 69 1%
tests\test_plot_numeric.py 80 79 1%
tests\test_plot_time_series.py 66 64 3%
tests\test_pyxplor.py 1 0 100%
----------------------------------------------------
TOTAL 311 304 2%
=============================================== short test summary info ===============================================
ERROR tests/test_plot_binary.py
ERROR tests/test_plot_categorical.py
ERROR tests/test_plot_numeric.py
ERROR tests/test_plot_time_series.py
ERROR tests/test_pyxplor.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================== 5 errors in 0.37s ==================================================
That way, I could run the tests, but I got some errors.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-xxxxxxxxxxxx> in <module>
----> 1 import pyxplor
2 from pyxplor.plot_binary import plot_binary
3 from pyxplor.plot_categorical import plot_categorical
ModuleNotFoundError: No module named 'pyxplor'
So, apparently, pyxplor was still not available in my computer after following the instructions. I ended up using pip install pyxplor. After that, I could run the code without any issues. Also, I re-ran the tests and now all of them passed:
================================================== warnings summary ===================================================
tests\test_plot_binary.py:2
C:\Users\Marcony1\OneDrive - Fundacion Universidad de las Americas Puebla\Documents\MDS\Block 4\DSCI 524\PyXplor\tests\test_plot_binary.py:2: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
tests/test_plot_time_series.py::test_valid_input
C:\Users\Marcony1\miniconda3\envs\pyxplor\Lib\site-packages\pyxplor\plot_time_series.py:104: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
ts_data = input_df.set_index(date_column).resample(freq)[column].mean()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===========================================
61 passed, 2 warnings in 21.40s ===========================================
In general, I really liked your package and I do find it pretty handy. As a more advanced user, I could definitely make things work. However, I think that if you'ld want rookie programmers to use your package, it would be helpful if you made the instructions more baby-level as well as making sure that it works in different OS's so that they won't have any problem trying to use it.
Good job overall!
Submitting Author: Name (@rbouwer) All current maintainers: (@arturoboquin, @iris0614, @phchen5) Package Name: PyXplor One-Line Description of Package: A package for simplifying the EDA of different data types! Repository Link: https://github.com/UBC-MDS/PyXplor Version submitted: 2.0.0 Editor: @ttimbers
Reviewer 1: @Marcony1 - Marco Polo Bravo Montiel Reviewer 2: @salva-u Salva Umar
Reviewer 3: @joeywwwu - Joey Wu Reviewer 4: @YimengXia - Yimeng Xia
Archive: TBD JOSS DOI: TBD Version accepted: TBD Date accepted (month/day/year): TBD
Code of Conduct & Commitment to Maintain Package
Description
pyxplor
is a comprehensive Python package designed to automate and streamline the Exploratory Data Analysis (EDA) process. Tailored for various data types including numeric, categorical, binary, and time series data,pyxplor
aims to enhance data interpretation through a suite of specialized plotting functions. This package seeks to reduce the complexity and time invested in initial data analysis, making it an essential tool for data scientists and analysts at all levels.Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
Domain Specific & Community Partnerships
Community Partnerships
If your package is associated with an existing community please check below:
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
The target audience for this package is all data scientists starting their data analysis with EDA. This package enhances data visualization for better interpretation and reduces the complexity and time invested in initial data analysis.
While there are several EDA packages in the Python ecosystem, such as
pandas-profiling
(link) andsweetviz
(link),pyxplor
differentiates itself by offering specialized functions for different data types. This targeted approach enables more nuanced and relevant insights, particularly for binary and time-series data which are often less catered for in existing tools.pyxplor
complements these existing tools by filling these specific gaps, thus enriching the Python EDA toolkit.@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication Options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Confirm each of the following by checking the box.
Please fill out our survey
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.