Break the Main Vignette, into smaller ones, could have a quick look, basic usage section. A longer in depth vignette going into the lengthy explorations that are possible with the package (separated) under a separate tab. Albeit the current breakdown is definitely helpful. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1915907137 - 3)
It would be better if there was more context about the dataset, and a more detailed introduction explaining the importance and applications of EDA in data science would help users understand the relevance and application of the examples. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916139577 - 2)
If possible, include interactive elements or widgets in the tutorial for a hands-on experience. These interactive elements can make the learning process more engaging and effective, and help users better understand the capabilities and use of the package. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916139577 - 3)
Function Improvements
plot_categorical @rbouwer (Nice to have - OPTIONAL)
It might be beneficial to reconsider the use of colour in the Distribution of Categorical Variables. The current approach assigns colours to bars based on their count ranking, which could potentially confuse users. For example, in your example.ipynb, pickup_borough is displayed in green for Bronx, while dropoff_borough is in red, solely due to count variations for the same variable. Given that each bar already has a clear label, the additional colour coding might not be necessary and could be removed. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916277295 - 3)
plot_numeric @iris0614 (Nice to have - OPTIONAL)
You can enhance the EDA experience by offering scatterplots between the target variable (if numerical) and the numerical explanatory variables, catering to users who want to visualize the relationships before creating models for predictions. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916277295 - 4)
Ordinal? (For later)
Consider EDA for ordinal variables, I think "passengers" is visualized as an ordinal variable instead of categorical variable, as you've maintained the natural order of the number of passengers instead of ranking them as you did with other categorical variables. (see example.ipynb). (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916277295 - 5)
As of today, the installation instructions' title is "Installation (developers)". As a regular user, that made me think that I was looking into the wrong section and I searched for the "Installation (mortals)" section. As I didn't find one, I assumed that you chose this title as the package is still in development. However, I would remove the "(developers)" in the final version or create a regular user section and move the developer's instructions to the ReadTheDocs full documentation website. (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1917919802 - 5) @phchen5
Tests
Flies to Delete
pyxplor.py
(https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1915907137 - 2) (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916139577 - 1) (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916277295 - 2) (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1917919802 - 1) @iris0614 (DELETE:pyxplor.py
fromsrc
AND fromtests
)Vignette @rbouwer
Function Improvements
plot_categorical
@rbouwer (Nice to have - OPTIONAL)plot_numeric
@iris0614 (Nice to have - OPTIONAL)Badges
Continuous integration
andtest coverage
, andPython versions supported
(https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1915907137 - 5) (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916139577 - 4) (https://github.com/UBC-MDS/software-review-2024/issues/9#issuecomment-1916277295 - 1) @arturoboquinRepo Improvements
README