Open chenzhao299 opened 3 years ago
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2.5 hours
Hey noaastn team,
Great job on the package! A really well thought out package in terms of design and portability, I can really see myself using it all the time if I am to perform a weather analysis project. Overall the functionalities are great, tests all passed for me, and there really isn't any errors so what I'm suggesting below could just be nit-picking!
get_weather_data()
, I see that you assert station_number
to be all numbers, but from the output of get_stations_info()
, I see that some stations actually start with letter "A", e.g. "A51256-00451". Querying these stations is therefore impossible. Are these stations intentionally avoided? If so, you might want to clarify somewhere. get_stations_info()
and get_weather_data()
take some time to return, which is likely due to the API waits. A useful thing to do would be to print a prompt that says something like "retrieving data...", or even print out the progress "...300 records retrieved" so that users know it's not stuck. get_stations_info()
and a for-loop in get_weather_data
. Not sure how much of a boost list comprehension would give, but I'd give it a try. get_weather_data()
seems to take significantly longer than get_stations_info
, I'm unsure why that is though. get_weather_plot()
where you have asserts throughout the function to handle edge cases. However, it would be nice that such code style is consistent for all functions. For example, in get_stations_info()
if-else statements were used. get_weather_plot()
, I see that you have a lot of repeated codes depending on the input argument, and the only differences between repeated chunks are plot titles, axis labels and the X column names.
This can be easily solved with regular expressions, and your function will be much more readable afterwards!
time_basis
and/or col_name
. I feel like
this is standard practice for a lot of plotting functions. Maybe not the col_name
, though. Overall documentations are nicely done and I love that you included examples in your docstring, which was really helpful. Some minor fixes:
get_weather_plot()
:
time_basis
, "...observations on 'monthly' or 'daily basis'", the single quote at the end should not include 'basis'. obs_df
is obscure, both in the param definition and in the examples. I didn't
know I could just use the output from get_weather_data()
until I try it out. CONTRIBUTORS.md
or put it directly in README.md
.README.md
:
noaastn.get_weather_data("911650-22536", 2020)
> stn datetime air_temp atm_press wind_spd wind_dir
> 0 911650-22536 2020-01-01 00:00:00 26.1 1019.2 7.2 90.0
> 1 911650-22536 2020-01-01 00:53:00 27.2 1018.7 6.7 80.0
> 2 911650-22536 2020-01-01 01:51:00 26.0 NaN 6.7 80.0
> 3 911650-22536 2020-01-01 01:53:00 26.1 1018.6 7.7 80.0
> 4 911650-22536 2020-01-01 02:51:00 26.0 NaN 7.7 90.0
> ... ... ... ... ... ... ...
> 16206 911650-22536 2020-12-31 19:53:00 25.6 1022.1 8.2 50.0
> 16207 911650-22536 2020-12-31 20:53:00 26.1 1021.9 9.3 50.0
> 16208 911650-22536 2020-12-31 21:53:00 26.1 1021.4 8.8 50.0
> 16209 911650-22536 2020-12-31 22:53:00 26.1 1020.4 9.3 50.0
> 16210 911650-22536 2020-12-31 23:53:00 26.7 1019.6 8.2 50.0
Again, congrats on finishing the package and great job!
Marco
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
setup.py
file or elsewhere.Readme requirements The package meets the readme requirements below:
The README should include, from top to bottom:
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 3
Hello team noaastn!
Thanks for creating an easy to follow and succinct package for me to review. The goals of the package were very obvious from the start, all functions were clearly explained, and I had no issues installing from TestPyPI. I also ran the test functions using poetry run pytest
and everything passed as expected. Continuous integration and deployment all seemed to be in use and working as expected so congrats on getting that sorted!
I really enjoyed the use of the get_weather_plot()
. I could see it being tremendously useful if someone just wanted a quick visual glance at the data and they didn't want to fiddle around with the altair specifications.
One thing I was briefly confused about was that the get_stations_info()
function returns station names where as the get_weather_data()
function requires the station code. It didn't take too long to figure out that the station code was a combination of the "usaf" and "wban" columns and I discovered this was actually described in the function documentation. That said however, I think a brief note should be made in the usage section of the README.md file mentioning this point to avoid confusion with future users as the README.md is the front page of the package.
The tests/test_noaastn.py looks to be empty so I believe it can be removed without issue since all of your tests are split into separate files.
There is a minor spelling error in the README.md file. In the intro paragraph, "through a publically" should be "through a publicly".
I'm not sure if this is a must have for pyOpenSci so take this with a grain of salt, but I can't find the creators/maintainers of the package listed anywhere. I think some sort of contact info should be available that the authors are comfortable sharing, whether it be the names of the authors, their GitHub handles, their email addresses, etc. I think this information could fit under the "Contributors" section of the README.md. This would be very useful to have if there was a necessary bug fix, or someone just wanted to communicate with the authors privately. I see that the README.md links to the contributors tab, but I believe this could become overloaded with outside contributors if the package becomes very popular given the MIT License used.
get_weather_data()
returns an error from the ftp site if a year is passed in for which no data is available. As an example: noaastn.get_weather_data("911650-22536", 1970)
> Error generated from NOAA FTP site:
> 550 911650-22536-1970.gz: No such file or directory
> 'FTP Error'
I found this error a little ambiguous. Is there an issue with the FTP site or does data not exist for this year? To improve upon this, I think the function itself should test that the year falls within the available limits and if not, outputs an error message that describes the range of years that are available for the given station. It looks like the get_stations_info()
returns two columns "start" and "end" which outline the range of years available so perhaps this could be used to test the validity of the year argument.
The get_weather_plot()
could be written in a more DRY fashion. If the package was to expand to include more columns of data rather than just "air_temp", "atm_press", "wind_spd", and "wind_dir", then the code would become harder to maintain and start to look quite messy. I have created an issue here with a suggestion on how the code could be updated. The suggestion mainly updates the title variable and the y axis variable for each plot so that they don't have to be hardcoded every time.
When using the get_weather_plot()
with the "daily" argument, the x axis writes out the full name of the month. I think it would be cleaner to have the plot write out the abbreviated version of each month just as the function does when the "monthly" argument is passed in. It is also a little strange that the x axis starts with the year value on the first tic mark rather than the first abbreviated month.
Overall, it was a pleasure to review the package and learn from all the work you've put in! Feel free to address any or all of the suggestions above. Your team covered everything quite well so most of the suggestions above are small things.
Regards, Chad Neald
Thanks very much Chad and Marco for the helpful reviews. We had a lot of fun making this package and your suggestions will help us improve it. We plan on implementing a number of the changes you suggested and we will post here when we do.
Thank you @mmyz88 and @ChadNeald for the helpful reviews! We have addressed the changes based on your suggestions in the last two pull requests. The details are as follows:
Marco:
Chad:
Please feel free to check them out. Let us know if you have any more suggestions or advices. Thank you very much!
Submitting Author:
Package Name: noaastn One-Line Description of Package: Python package that downloads, processes and visualizes weather data from the NOAA website. Repository Link: noaastn Version submitted: 0.2.6 Editor: Tiffany Timbers Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
Version accepted: TBD
Description
The US National Oceanic and Atmospheric Administration (NOAA) collects and provides access to weather data from land-based weather stations within the US and around the world (Land-Based Station Data). One method for accessing these data is through a publically accessible FTP site. This package allows users to easily download data from a given station for a given year, extract several key weather parameters from the raw data files, and visualize the variation in these parameters over time. The weather parameters that are extracted with this package are:
Scope
* Please fill out a pre-submission inquiry before submitting a data visualization package. For more info, see notes on categories of our guidebook.
@tag
the editor you contacted:Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
JOSS Checks
- [ ] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [ ] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [ ] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [ ] The package is deposited in a long-term repository with the DOI: *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
Editor and review templates can be found here