JOSE Review - comment on Bringing it all together

kls2177 commented 1 year ago

This Chapter nicely wraps up the tutorial. Just a few comments below.

Section: Producing Results

This section is out of my area, so I only have a few comments.

could you potentially create a more descriptive title. "Producing Results" is quite generic
I don't quite understand the learning outcome -> who is "offering"? The verb should refer to what the student will be able to do by the end of the chapter.
is there a python implementation for the error term? I found something that might be relevant on stackoverflow: https://stackoverflow.com/questions/30553838/getting-statsmodels-to-use-heteroskedasticity-corrected-standard-errors-in-coeff. Perhaps the statsmodels or linearmodels packages might have relevant functionality.

Section: Hands-On Exercise, Step 4

note that I wasn't able to test the code in this section because I had an error in Step 3.
Again, I would appreciate a python implementation of this section. Perhaps, using pandas and then one of the linearmodels packages would be the most straightforward.
in the "Running the regression" sub-section, what do you mean by "state trends"? Why do you use state trends versus county trends? Why have county fixed effects but not state fixed effects? It would be helpful if you explained your choices. Perhaps, you could build in some sensitivity tests for students to do on their own to justify the choices - for example, what happens to the results if they don't include FEs or use different ones? What about the linear and quadratic temperature terms? Does having both improve the results? How do they know which model is more robust (not sure if you use an adjusted R^2 for this type of thing)?
after the figure, it might be useful to describe what is shown. How do you want students to interpret the figure? (Don't forget to revise C to deg C in the figure x-axis as well).

Section: Suggestions for work organization

second note: I don't think you introduced the term "vector data" in the pervious GIS section.

jrising commented 1 year ago

Thank you for these helpful comments! I respond to each one below.

Section: Producing Results

could you potentially create a more descriptive title. "Producing Results" is quite generic

We have now called this section "Some pointers when performing regressions", which is much more descriptive.

I don't quite understand the learning outcome -> who is "offering"? The verb should refer to what the student will be able to do by the end of the chapter.

Good points. The learning outcome is now rewritten as "Avoid common pitfalls when running regressions and plotting results."

is there a python implementation for the error term?

Yes, there is an R implementation of the Conley HAC standard errors. I have now added a tab with this link.

Section: Hands-On Exercise, Step 4

note that I wasn't able to test the code in this section because I had an error in Step 3.

Thank you for looking at this. In addition to fixes in the underlying library, which hopefully would let you run step 3, we have also added an R implementation of step 3 using a newly developed library.

Again, I would appreciate a python implementation of this section.

Thank you for pushing us to do this. We have now added this. It did require some hand-built code to do the state-specific trends and to calculate the confidence intervals of the dose-response function, but hopefully this will be helpful for future students.

in the "Running the regression" sub-section, what do you mean by "state trends"? Why do you use state trends versus county trends? Why have county fixed effects but not state fixed effects? It would be helpful if you explained your choices. Perhaps, you could build in some sensitivity tests for students to do on their own to justify the choices - for example, what happens to the results if they don't include FEs or use different ones? What about the linear and quadratic temperature terms? Does having both improve the results? How do they know which model is more robust (not sure if you use an adjusted R^2 for this type of thing)?

These are good questions.

We have clarified that "state trends" meant "state-specific trends" and added a written-out specification.
It would certainly be reasonable to consider county trends. We have now added a discussion to help students decide on the right level and structure of trends.
State fixed effects are actually not necessary here. They are colinear with the county fixed effects and so would be dropped.
We have now included a discussion of the kinds of sensitivity tests, and comments about using R^2 in combination with looking at the resulting fit.
The inclusion of linear and quadratic temperature terms makes sense given what we know about the effects of cold and hot temperatures on mortality. This is explained now in the text.

Here is the much-extended discussion in the tutorial now based on your comments:

Let's run our central regression, relating death rate to temperature. In keeping with the literature, we will assume that weather-related mortality is a u-shaped function. That is, both cold and hot temperatures cause increased mortality. We also assume that these effects are convex, so that more extreme cold and heat will produce a more-than-linear increase in mortality. The simplest relationship that has these features is a quadratic. Refer to the functional forms section for further considerations.

For fixed effects, we use county fixed effects, to account for unobserved constant heterogeneity, and state-specific trends, to account for gradual changes at a larger scale. While the high-resolution fixed effects are necessary and standard, the choice of trends can be more subtle. Here are a few points:

 - The high-resolution and more flexible the better, as far as identification is concerned. For example, we might consider county-specific trends, higher-order polynomials, or cubic splines. These better isolate weather shocks, which are the core of the identification strategy.
 - At the same time, flexible trends capture the variation necessary for statistical analysis. For example, a state-by-year fixed effect would leave very little variation at the county level, because counties weather is correlated within each state.
 - The right balance between these depends upon the other drivers affecting the dependent variable. It is useful to graph the timeseries for individual units and for groups of units (e.g., every county within a state). Trends should be imposed whereever there is non-weather-related drift in the dependent variable, and their resolution should be fine enough so that the values for all the contained regional units are drifting in the same way.
- Where reasonably researchers can disagree about the right trend specification, generate tables that show several specifications.  Typical sensitivity tests include more flexible fixed effects and functional forms and the inclusion of other weather variables. The R^2 values will help you understand how much of the variation you are capturing, but it is important to look at the resulting fit to decide if the regression is doing what you expect.

In our case, we see gradual shifts which are captured fairly well by state-level trends. We also only have 10 years of data, but with more data, more saturated fixed effects should be explored.

Here is our specification: $$M_{sit} = \beta_1 T_{it} + \beta_2 T_{it}^2 + \gamma_i + \delta_s t$$ for mortality in state $$s$$, county $$i$$, year $$t$$.

Note that since we do not control for precipitation, temperature in this case includes correlated effects of rainfall. As a result, it is not ideal for future projections.

after the figure, it might be useful to describe what is shown. How do you want students to interpret the figure? (Don't forget to revise C to deg C in the figure x-axis as well).

Great point. We have now added this paragraph:

The estimated dose-response function suggests that counties with an average temperature higher than about 2 deg C observe increases in mortality when the temperature rises and decreases in mortality when it falls. Counties with very cold average temperatures see the opposite-- for them, warmer temperatures reduce mortality.

Since baseline mortality rates are different in each county, the dose-response function can only tell us about the effect of changes. So, for example, relative to the mortality expected on a 20 deg C day, a 30 deg C day is projected to result in 0.3 additional deaths per 100,000, equivalent to deaths attributable to all natural disasters in the US.

Section: Suggestions for work organization

second note: I don't think you introduced the term "vector data" in the pervious GIS section.

We have changed this to read "geospatial polygon data" to be clearer.

kls2177 commented 8 months ago

@jrising

Thanks for the updates.

After the bulleted list in the "Running the Regression" section, there is an equation that is not rendering correctly. Please update.

Major Issue:

I am having some issues with the code that likely relate to my new issue in Step 3 of the Hands-On Exercise:

When I try to run the regression using python, I get the following error:

ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.

If I add, "check_rank=False", I get the following error:

**AbsorbingEffectError: The model cannot be estimated. The included effects have fully absorbed one or more of the variables. This occurs when one or more of the dependent variable is perfectly explained using the effects included in the model.

The following variables or variable combinations have been fully absorbed or have become perfectly collinear after effects are removed:

      tas_adj
      tas_sq

Set drop_absorbed=True to automatically drop absorbed variables.**

If I add "drop_absorbed=True", this still gives me the same error. I think this is related to the RunTimeError I was getting in Step 3. Please revise.

Also, in the plotting section (and throughout), please revise C to deg C using Markdown as follows:

plt.xlabel('Daily temperature ($^{\circ}$C)')

kls2177 commented 8 months ago

Here is some additional information:

@ks905383 The population data has a lot of zeros and the temperature data has masked values over ocean.

Also, note that when I plot the map of summer temperature (Hands-On Exercise, Step 1), there are a few spots over the US that are also missing data (figure attached)

.

kls2177 commented 7 months ago

Hello! I noticed that xagg has been updated and I gave it a try. It now works, but the figure that I get looks different from the one in the jupyterbook. fit

ks905383 commented 7 months ago

Also, note that when I plot the map of summer temperature (Hands-On Exercise, Step 1), there are a few spots over the US that are also missing data (figure attached)

Those are in the original dataset - BEST is land-only, so water-only pixels are masked; in this case, pixels that are entirely within the Great Lakes have no data, though that might be worth pointing out if it's unclear.

Hello! I noticed that xagg has been updated and I gave it a try. It now works, but the figure that I get looks different from the one in the jupyterbook.

Thanks for checking that. This is odd, though - I expected there to be a difference (one of the xagg fixes was an issue with floating point differences in grids between weights and the underlying raster), but we seem to be getting different figures when I run the Hands-On exercises:

(also there are a few minor things in the Hands-On exercises (.nc4 --> .nc, removing warnings, etc.) that I'll PR shortly , after which the whole workflow should work as copied from the website).

kls2177 commented 7 months ago

Ok, thanks. I will go through the code again.

kls2177 commented 7 months ago

hi @ks905383,

I went through all the code and redownloaded all the datasets again and now I get the right figure - yay! I can't pinpoint what the issue was, but all seems to be working well now.

jrising commented 7 months ago

@kls2177 Thank you for going through the whole thing again. I really appreciate all the time you have spent testing this.

I have also fixed the rendering of that equation.

atrisovic / weather-panel.github.io

JOSE Review - comment on Bringing it all together #80