Open kls2177 opened 1 year ago
Thank you for the helpful comments! Will take a look at this shortly.
The full review is taking me a bit longer than I thought, so I probably won't be finished until next week. Sorry about that!
Thank you for the comments! I have addressed them in merge #85.
Responses to major comments
One example could be a time series of the volume of weather the climate data available
Added.
In the Hands-On Excercise, Step 1 section, it might be useful to ask students to reflect on the components of variability - is there a trend? a clear seasonal cycle? other low and high-frequency variability? Plotting the data is a good way for them to check that the steps they have taken make sense. Maybe asking them to plot a map of the time mean would be a useful exercise.
This was a great idea - I've now added a section asking them to plot and reflect on a sample time series and a sample map, to think about their data (and also to make sure it has been pre-processed correctly).
I suggest a note about satellite data products. I have seen products like land surface temperature from MODIS or LANDSAT or NDVI used in some climate econometrics studies, so I think that there should be some mention of these in the Gridded Data section. Perhaps, just a warning that these can be highly uncertain, served on unconventional grids and that collaboration with a climate scientist is recommended. You could also mention that there are some blended satellite+ground-based observational products (e.g. CHIRPS that you mention).
Good point. This has been added.
Responses to (major) minor comments
Overall, this section seems a bit too long.
Yeah, I think you're right - I've split up the section into more manageable chunks. I think it flows better now as well.
when you list "run" in your terminology bullet list, I don't agree with your statement "Don't worry about this".
Yes, fully agree, thanks for catching. The wording has been changed, and also made reference to uncertainty ensembles that are sometimes provided with observational data products. If we expand this guide to include a section on future climate data, this will certainly be expanded on as well.
Further changes In addition to the comments made above, I've also:
Detailed responses to minor comments:
Sub-section: The NetCDF Data Format
Is there an assumption that students use STATA specifically? This reference to STATA kind of came out of nowhere.
This has been clarified; STATA is commonly used in economic analysis.
- Python(NetCDF4) link doesn't work
Fixed.
nco -> maybe direct students to use ncdump -h for meta data only rather than ncdump because it is usually way too much of a data dump.
Good point, this has been updated.
As an open-source alternative to MATLAB, Python(NetCDF4) + numpy works very well and was typically the way Python users worked before xarray was developed.
Agreed; we're definitely keeping the mention in this section - though I think, at least for the purposes of this tutorial, we'll stick with xarray
for simplicity for the rest of the code chunks.
Sub-section: NetCDF Contents
it might be helpful to show a schematic of the netcdf data model: https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html
Added, thanks for the suggestion.
Sub-section: NetCDF Header
I realize that you want students to do most of the work themselves and not provide them with data sets as examples, but a picture is often worth a thousand words. It would be nice to show the file header - you can use the xarray sample data so that you don't have to rely on external data sources: https://tutorial.xarray.dev/fundamentals/04.1_basic_plotting.html. You could also use this sample data in the plotting section (which I would also recommend).
Agreed - showing an image of the header is definitely a good idea. Using arrays built-in sample data
Sub-section: Attributes
no_sleap -> no_leap
Fixed.
Sub-section: Basic Vis...
the correct title is "An Introduction to Earth and Environmental Data Science"
Fixed.
Cartopy section link not working
Fixed
Sub-section: 2-D plotting
- Remember that Earth is a sphere and for most grids you cannot average over all lat/lon points in this way: https://docs.xarray.dev/en/stable/examples/area_weighted_temperature.html
Definitely worth bringing up - I've changed the example to note this.
Sub-section: Maps
- Maybe add a note that other projections are available: https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html
I've added the following text:
Note that which map projection you use will influence how you read the map. In the code examples below, we will use an equal-area projection, in which every grid cell in the gridded data is shown with its accurate relative area, to avoid visually overemphasizing data in regions with smaller geographic extent. To see which other projections are available, see the relevant parts of the documentations (here for cartopy/python, and here for Matlab)
Sub-section: Gridded Data
- General note: it seems that the term “climate data” is being used to refer to “climate model output”. Climate data is not exclusively model output. For example, a 30-year climate normal from a weather station would be considered climate data.
I've changed references to 'data' to be more clear to what they're referring to (specifying climate model output, or historical "observational" output).
Sub-section: Reanalysis Datasets
- Products also differ by which assimilation scheme is used.
I've clarified:
Historical data products differ by how they “assimilate” (join observational with model data) or combine data, and how much “additional” information is added beyond (pre-processed) station data.
Sub-section: Warning, Station Data
- GHCN link not working
Fixed.
@ks905383
I appreciate these updates. I particularly like the updates to the Hands-On Exercise.
Lots of python issues:
geo_lims = {'lat':slice(23,51),'lon':slice(-126,-65)}
to
geo_lims = {'latitude':slice(23,51),'longitude':slice(-126,-65)}
and
ds.tas.sel(lon=-118.2,lat=34.1,method='nearest').plot()
to
ds.tas.sel(longitude=-118.2,latitude=34.1,method='nearest').plot()
and
# Plot the day-of-year average (ds.tas.sel(lon=-118.2,lat=34.1,method='nearest'). groupby('time.dayofyear').mean()).plot()
to
# Plot the day-of-year average (ds.tas.sel(longitude=-118.2,latitude=34.1,method='nearest'). groupby('time.dayofyear').mean()).plot()
For the cartopy plotting section, there are a few issues:
First, the line that is supposed to compute the summer average is not:
# Get average summer temperatures ds_summer = ds.isel(time=(ds.time.dt.season=='JJA'))
should be something like this, correct?
# Get average summer temperatures ds_summer = ds.isel(time=(ds.time.dt.season=='JJA')).mean(dim='time')
Also, when plotting, you define a projection, ax
, but then you don't reference it when you plot. I could only get a map to show up with this code:
ax.contourf(ds_summer.longitude,ds_summer.latitude,ds_summer,transform=ccrs.PlateCarree(),levels=21)
Finally, when you save the data in the python implementation, you save it to a directory called "sources". This should be called "data".
@kls2177 Thanks for identifying these. We meant to convert the variable names, but have now decided to leave the conversion to the end. In particular,
lat
/lon
to latitude
/longitude
.cartopy
, we have added the .mean('time')
, as you point out.contourf
line, we were referring to df
rather than df_summer
, which might have caused trouble for you. But concerning the ax
variable, in our test, we do not need to explicitly refer to it because matplotlib
overlays the subsequent contourf
command to the pre-defined axes.sources
directory to data
.The relevant commits are 28ad4a9edd63775f8fe7cdc7808aadb3e26d6007 and 7815a99294b405a3c4983bf6f2be4897bcd230f4.
Overall, very thorough but not too overwhelming. I appreciate the learning objectives at the outset and I feel that they align well with the content provided. I also appreciate that the code snippets are provided in several languages.
In the Introduction, the authors use a visual to motivate the section and engage students. I really liked this “engagement trigger” approach. Do you think a similar approach could be used to start off all chapters? For this chapter there are many different options for motivating visuals. One example could be a time series of the volume of weather the climate data available (e.g. from the NASA Earth Science and Data Systems: https://www.earthdata.nasa.gov/s3fs-public/2023-01/product-distribution-volume-discipline-2.jpg?VersionId=Tor97BJIz5dyuZofS5swA7RGwdccByVe )? This is just a suggestion.
Another general note about variability. When I have worked with students who are unfamiliar with weather and climate data, they are often surprised at how noisy the data is (even though they experience it everyday!). In the Hands-On Excercise, Step 1 section, it might be useful to ask students to reflect on the components of variability - is there a trend? a clear seasonal cycle? other low and high-frequency variability? This could be done by asking them to plot a time series of a single grid point. This might also lead nicely into the next Chapter where I believe you do touch on this somewhat.
One other general comment: I suggest a note about satellite data products. I have seen products like land surface temperature from MODIS or LANDSAT or NDVI used in some climate econometrics studies, so I think that there should be some mention of these in the Gridded Data section. Perhaps, just a warning that these can be highly uncertain, served on unconventional grids and that collaboration with a climate scientist is recommended. You could also mention that there are some blended satellite+ground-based observational products (e.g. CHIRPS that you mention).
Below are mostly minor comments:
Section 1: Using Weather and Climate Data
Cartopy section link not working Sub-section: 2-D plotting - Remember that Earth is a sphere and for most grids you cannot average over all lat/lon points in this way: https://docs.xarray.dev/en/stable/examples/area_weighted_temperature.html Sub-section: Maps - Maybe add a note that other projections are available: https://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html Sub-section: Gridded Data - General note: it seems that the term “climate data” is being used to refer to “climate model output”. Climate data is not exclusively model output. For example, a 30-year climate normal from a weather station would be considered climate data. Sub-section: Reanalysis Datasets - Products also differ by which assimilation scheme is used. Sub-section: Warning, Station Data - GHCN link not working
Section 2: How to start working with a Data Product - Third link, Reanalysis and Observational Datasets and Variables, is not working - In first paragraph, I would explicitly say “the NCEP2 reanalysis product” rather than just “NCEP2”, to remind students what type of product this is. - I would also suggest adding a note about file sizes. It’s sometimes easy to download data without being aware of how much space the files might take up on your computer. For big projects, additional/external storage may be required. Sub-section: Thinking ahead to climate projections