Overall, very thorough but not too overwhelming. I appreciate the learning objectives at the outset and I feel that they align well with the content provided. I also appreciate that the code snippets are provided in several languages.

In the Introduction, the authors use a visual to motivate the section and engage students. I really liked this “engagement trigger” approach. Do you think a similar approach could be used to start off all chapters? For this chapter there are many different options for motivating visuals. One example could be a time series of the volume of weather the climate data available (e.g. from the NASA Earth Science and Data Systems: https://www.earthdata.nasa.gov/s3fs-public/2023-01/product-distribution-volume-discipline-2.jpg?VersionId=Tor97BJIz5dyuZofS5swA7RGwdccByVe )? This is just a suggestion.

Another general note about variability. When I have worked with students who are unfamiliar with weather and climate data, they are often surprised at how noisy the data is (even though they experience it everyday!). In the Hands-On Excercise, Step 1 section, it might be useful to ask students to reflect on the components of variability - is there a trend? a clear seasonal cycle? other low and high-frequency variability? This could be done by asking them to plot a time series of a single grid point. This might also lead nicely into the next Chapter where I believe you do touch on this somewhat.

One other general comment: I suggest a note about satellite data products. I have seen products like land surface temperature from MODIS or LANDSAT or NDVI used in some climate econometrics studies, so I think that there should be some mention of these in the Gridded Data section. Perhaps, just a warning that these can be highly uncertain, served on unconventional grids and that collaboration with a climate scientist is recommended. You could also mention that there are some blended satellite+ground-based observational products (e.g. CHIRPS that you mention).

Below are mostly minor comments:

Section 1: Using Weather and Climate Data

Overall, this section seems a bit too long. I suggest splitting it in two, maybe with the Gridded Data sub-section starting a whole separate section within the chapter. The first section could be something like “Weather and Climate Data Basics” and then the second could be “Using Weather and Climate Data”.
broken link to Auffhammer et al. (2013) Sub-section: The NetCDF Data Format
Is there an assumption that students use STATA specifically? This reference to STATA kind of came out of nowhere. - Python(NetCDF4) link doesn't work
nco -> maybe direct students to use ncdump -h for meta data only rather than ncdump because it is usually way too much of a data dump.
As an open-source alternative to MATLAB, Python(NetCDF4) + numpy works very well and was typically the way Python users worked before xarray was developed. Sub-section: NetCDF Contents
it might be helpful to show a schematic of the netcdf data model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html Sub-section: NetCDF File Organization
when you list "run" in your terminology bullet list, I don't agree with your statement "Don't worry about this". From a sampling perspective this can be very important - some models have few runs and some have many. Students may over-sample a specific model if they are not aware of what this refers to. Perhaps, you address this later in the tutorial. If so, then I would link to this section, rather than saying "Don't worry about this." Otherwise, I would suggest that this issue of sampling is addressed somewhere in the tutorial. Sub-section: NetCDF Header
I realize that you want students to do most of the work themselves and not provide them with data sets as examples, but a picture is often worth a thousand words. It would be nice to show the file header - you can use the xarray sample data so that you don't have to rely on external data sources: https://tutorial.xarray.dev/fundamentals/04.1_basic_plotting.html. You could also use this sample data in the plotting section (which I would also recommend). Sub-section: Attributes
no_sleap -> no_leap Sub-section: Basic Vis...
the correct title is "An Introduction to Earth and Environmental Data Science"
Cartopy section link not working Sub-section: 2-D plotting - Remember that Earth is a sphere and for most grids you cannot average over all lat/lon points in this way: https://docs.xarray.dev/en/stable/examples/area_weighted_temperature.html Sub-section: Maps - Maybe add a note that other projections are available: https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html Sub-section: Gridded Data - General note: it seems that the term “climate data” is being used to refer to “climate model output”. Climate data is not exclusively model output. For example, a 30-year climate normal from a weather station would be considered climate data. Sub-section: Reanalysis Datasets - Products also differ by which assimilation scheme is used. Sub-section: Warning, Station Data - GHCN link not working

Section 2: How to start working with a Data Product - Third link, Reanalysis and Observational Datasets and Variables, is not working - In first paragraph, I would explicitly say “the NCEP2 reanalysis product” rather than just “NCEP2”, to remind students what type of product this is. - I would also suggest adding a note about file sizes. It’s sometimes easy to download data without being aware of how much space the files might take up on your computer. For big projects, additional/external storage may be required. Sub-section: Thinking ahead to climate projections
Auffhammer et al. (2013) link does not seem to be correct - NASA NEX-GDDP link not working - GMFD link not working ============================ Section 3: Hands-on Exercise, Step 1 - First paragraph, “(area-weighted, so not usually useful even for studying national-level data) -> not sure what you mean by this. Maybe it will become clear in the “Weighting Schemes” section. If so, perhaps a “see more here” + link to this section would be useful. Otherwise, please elaborate. - Step 5: “Place the file … in the data/climate_data folder” -> has this folder been introduced already? If not, I suggest rewording: “Create a folder called data/climate_data and place the file in this folder”. Students tend to have a lot of trouble with file paths, so spelling it out is usually helpful. - I suggest noting the units of the data (deg C) somewhere as the data is not in the usual units of K. It is probably on the BEST website somewhere, but I couldn’t find it easily. It is listed in the temperature variable attributes, but students might not know how to display this. - File naming convention: CMIP5 -> CMIP6? - To finish the exercise, I suggest asking students to plot the data and refer them back to the previous code snippets on plotting. Plotting the data is a good way for them to check that the steps they have taken make sense. Maybe asking them to plot a map of the time mean would be a useful exercise.

Thank you for the helpful comments! Will take a look at this shortly.

The full review is taking me a bit longer than I thought, so I probably won't be finished until next week. Sorry about that!

Thank you for the comments! I have addressed them in merge #85.

Responses to major comments

One example could be a time series of the volume of weather the climate data available

Added.

In the Hands-On Excercise, Step 1 section, it might be useful to ask students to reflect on the components of variability - is there a trend? a clear seasonal cycle? other low and high-frequency variability? Plotting the data is a good way for them to check that the steps they have taken make sense. Maybe asking them to plot a map of the time mean would be a useful exercise.

This was a great idea - I've now added a section asking them to plot and reflect on a sample time series and a sample map, to think about their data (and also to make sure it has been pre-processed correctly).

I suggest a note about satellite data products. I have seen products like land surface temperature from MODIS or LANDSAT or NDVI used in some climate econometrics studies, so I think that there should be some mention of these in the Gridded Data section. Perhaps, just a warning that these can be highly uncertain, served on unconventional grids and that collaboration with a climate scientist is recommended. You could also mention that there are some blended satellite+ground-based observational products (e.g. CHIRPS that you mention).

Good point. This has been added.

Responses to (major) minor comments

Overall, this section seems a bit too long.

Yeah, I think you're right - I've split up the section into more manageable chunks. I think it flows better now as well.

when you list "run" in your terminology bullet list, I don't agree with your statement "Don't worry about this".

Yes, fully agree, thanks for catching. The wording has been changed, and also made reference to uncertainty ensembles that are sometimes provided with observational data products. If we expand this guide to include a section on future climate data, this will certainly be expanded on as well.

Further changes In addition to the comments made above, I've also:

split up the section into more and shorter subsections for ease of reading
added more introductions and concluding texts to better guide the reader
replaced links to papers with DOIs whenever possible, which will hopefully be more stable in the long run

Detailed responses to minor comments:

Sub-section: The NetCDF Data Format

Is there an assumption that students use STATA specifically? This reference to STATA kind of came out of nowhere.

This has been clarified; STATA is commonly used in economic analysis.

Python(NetCDF4) link doesn't work

Fixed.

nco -> maybe direct students to use ncdump -h for meta data only rather than ncdump because it is usually way too much of a data dump.

Good point, this has been updated.

As an open-source alternative to MATLAB, Python(NetCDF4) + numpy works very well and was typically the way Python users worked before xarray was developed.

Agreed; we're definitely keeping the mention in this section - though I think, at least for the purposes of this tutorial, we'll stick with xarray for simplicity for the rest of the code chunks.

Sub-section: NetCDF Contents

it might be helpful to show a schematic of the netcdf data model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html

Added, thanks for the suggestion.

Sub-section: NetCDF Header

I realize that you want students to do most of the work themselves and not provide them with data sets as examples, but a picture is often worth a thousand words. It would be nice to show the file header - you can use the xarray sample data so that you don't have to rely on external data sources: https://tutorial.xarray.dev/fundamentals/04.1_basic_plotting.html. You could also use this sample data in the plotting section (which I would also recommend).

Agreed - showing an image of the header is definitely a good idea. Using arrays built-in sample data

Sub-section: Attributes

no_sleap -> no_leap

Fixed.

Sub-section: Basic Vis...

the correct title is "An Introduction to Earth and Environmental Data Science"

Fixed.

Cartopy section link not working

Fixed

Sub-section: 2-D plotting

Remember that Earth is a sphere and for most grids you cannot average over all lat/lon points in this way: https://docs.xarray.dev/en/stable/examples/area_weighted_temperature.html

Definitely worth bringing up - I've changed the example to note this.

Sub-section: Maps

Maybe add a note that other projections are available: https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html

I've added the following text:

Note that which map projection you use will influence how you read the map. In the code examples below, we will use an equal-area projection, in which every grid cell in the gridded data is shown with its accurate relative area, to avoid visually overemphasizing data in regions with smaller geographic extent. To see which other projections are available, see the relevant parts of the documentations (here for cartopy/python, and here for Matlab)

Sub-section: Gridded Data

General note: it seems that the term “climate data” is being used to refer to “climate model output”. Climate data is not exclusively model output. For example, a 30-year climate normal from a weather station would be considered climate data.

I've changed references to 'data' to be more clear to what they're referring to (specifying climate model output, or historical "observational" output).

Sub-section: Reanalysis Datasets

Products also differ by which assimilation scheme is used.

I've clarified:

Historical data products differ by how they “assimilate” (join observational with model data) or combine data, and how much “additional” information is added beyond (pre-processed) station data.

Sub-section: Warning, Station Data

GHCN link not working

Fixed.

@ks905383

I appreciate these updates. I particularly like the updates to the Hands-On Exercise.

Lots of python issues:

In the Hands-On Exercise, these lines of code needs to be edited as follows:

geo_lims = {'lat':slice(23,51),'lon':slice(-126,-65)}

geo_lims = {'latitude':slice(23,51),'longitude':slice(-126,-65)}

and

ds.tas.sel(lon=-118.2,lat=34.1,method='nearest').plot()

ds.tas.sel(longitude=-118.2,latitude=34.1,method='nearest').plot()

and

# Plot the day-of-year average (ds.tas.sel(lon=-118.2,lat=34.1,method='nearest'). groupby('time.dayofyear').mean()).plot()

# Plot the day-of-year average (ds.tas.sel(longitude=-118.2,latitude=34.1,method='nearest'). groupby('time.dayofyear').mean()).plot()

For the cartopy plotting section, there are a few issues:

First, the line that is supposed to compute the summer average is not:

# Get average summer temperatures ds_summer = ds.isel(time=(ds.time.dt.season=='JJA'))

should be something like this, correct?

# Get average summer temperatures ds_summer = ds.isel(time=(ds.time.dt.season=='JJA')).mean(dim='time')

Also, when plotting, you define a projection, ax, but then you don't reference it when you plot. I could only get a map to show up with this code:

ax.contourf(ds_summer.longitude,ds_summer.latitude,ds_summer,transform=ccrs.PlateCarree(),levels=21)

Finally, when you save the data in the python implementation, you save it to a directory called "sources". This should be called "data".

@kls2177 Thanks for identifying these. We meant to convert the variable names, but have now decided to leave the conversion to the end. In particular,

We have made the first three specific changes you recommend, changing lat/lon to latitude/longitude.
In the cartopy, we have added the .mean('time'), as you point out.
In our contourf line, we were referring to df rather than df_summer, which might have caused trouble for you. But concerning the ax variable, in our test, we do not need to explicitly refer to it because matplotlib overlays the subsequent contourf command to the pre-defined axes.
We have switched a couple of lingering references to a sources directory to data.

The relevant commits are 28ad4a9edd63775f8fe7cdc7808aadb3e26d6007 and 7815a99294b405a3c4983bf6f2be4897bcd230f4.

atrisovic / weather-panel.github.io

JOSE Review - comments on Weather and Climate Data Chapter #77

Below are mostly minor comments: