Open jcairn02 opened 9 months ago
I think step 6 is critical , step 7 is important and was mentioned by multiple people. Most of the general suggestions are optional, the easy quick fixes I agree with and would like to do. IE 2, 3, 8
Thanks so for summarising, Jordan. It looks neat!
I agree with most of the feedback and can get started on it tonight.
I'm happy to work on
[x] 6 (maybe we can keep a basic env with poetry in it) +
[x] 10 (badges) - I made the codecov badge but cannot make the pypi badge because I cannot log into those.
[x] 2 (installation instructions) +
[x] 1 (intro),
and more stuff as we go along.
Also, do y'all think we need to work on 7? AFAIK tiff's repo has only one test file as well.
I think we don't need more than one test file, but we need to make sure our tests work and are reproduceable. We could also improve test coverage.
This post is a copy of each suggestion , I will reply to this with commonalities and summarise the suggestions so we can divide the work accordingly and address issues we think are most important . https://github.com/UBC-MDS/software-review-2024/issues/8
"A more detaied introduction of the package. Now the datexplore has a clear and direct introduction in readme saying that it is a package for exploratory data analysis and data cleaning. It would be even better if there is more information on what kind of EDA and cleaning this one is capable of as the field of EDA and data cleaning is pretty broad. Having a more detailed introduction could help user know if the package is the one they need for their specific tasks. Installation: Maybe it will be more helpful to add one more line under installation saying to clone the repo first before running the installation code. Since there may be users not that experienced with git repo and if they did not clone the repo first, they may have issue install this package. Great test functions within test_datexplore.py! I also noticed that there is a test.py file within tests directory, which contains def test_example(): assert 1 == 1. I'm not sure if it could be put into the test_datexplore.py for clearer structure or if it is not used anymore, it may be clearer to remove it? within the example.ipynb, I saw an error message under "For column name with a space:" , it will be better to view the example without any error message. Other things looks nice. If I have to make one more comment, It may be better to have more examples for users to understand how the package works."
"I had trouble installing your environment with 'conda env create -f environment.yml' from the root of the project. I am not sure where the issue comes from but the installation process got stuck. I solved the issue by just creating an empty environment and then installed poetry in that environment. Building on my point from 1, you do not need the environment file when working with poetry. I think you could even remove the environment.yml file form the directory to reduce the number of files in your directory. When I tried to run your tests 5 failed, 12 passed and I got 35 warnings. This has been mentioned by other reviewers before. You have collected all the tests in one file. Unless the functions are very closely related to each other, I prefer having the tests for each function in individual files to make it easier to find the tests for each function. In your functions, you check for the correct input.You could add some tests that test if the error is returned as expected when the user inputs wrong arguments. You have a file tests/test.py that seems like it's not used. You could remove that file."
"The datexplore package is designed for exploratory data analysis and data cleaning in the early stages. It smartly complements broader libraries like Pandas and Scikit-learn by focusing on simplicity and user efficiency. The purpose of the package is clear, and the need for it is well-articulated. The package is well-structured and user-friendly. The documentation is informative, providing a solid overview of the key functions: clean_names, visualise, and detect_outliers. The examples are practical and effectively demonstrate the package's capabilities. Some suggestions for improvement:
" Below are the comments that I have regarding to the project:
When running "pytest tests/", it shows that there are 4 tests that failed. The ci-cd also reflected this issue. The docstring for the function "visualise()" does not specify the example input dataframe df. It might be more clear to add the exact structure of df into the docstring. It would be great to include badges for other categories (i.e. ci-cd and test coverage) Might want to be more detailed in the outline section. Might want to add the "git clone" and "cd" step for clarity. Other things look good!"