On Problem 1: Computing PAF with diverging integrals 2.2.2 Gamma case

INSP-RH / pifpaf

Estimation of the Population Impact Fraction and Potential Impact Fraction

GNU General Public License v3.0

3 stars 1 forks source link

On Problem 1: Computing PAF with diverging integrals 2.2.2 Gamma case #5

Closed fcudhea closed 7 years ago

fcudhea commented 7 years ago

I think, by using a beta distribution for the true exposure distribution, you weaken your argument since this is no longer strictly a diverging integrals situation. It is also a Distribution choosing situation (from part 3). The example also seems a bit contrived in two ways:

If the mean is negative, one would not use a gamma distribution to describe the exposure distribution in the first place (especially this true distribution where most observations would be negative). By making the true mean negative, you've manufactured a scenario we fail the theta < mu/sigma^2 requirement for convergence.
I still don't quite understand if negative PAF make sense. PAF stands for population attributable fraction. "fraction" meaning, it's implied that it's bounded from 0 to 1. The interpretation of the value is the proportion of the outcomes in the population that can be attributed to the exposure. On that level, I don't understand what a negative PAF means. I'm also wary of this scenario where intake can be negative. While in practice, I've seen people (including us) integrate over the negative real line in the past, I believe the literature consistently writes out the PAF as integrating from 0 to m (0 being minimum possible value and m being maximum possible value). So allowing for negative intakes is actually a huge paradigm shift, in my opinion. However, the negative true PAF is probably driven by the combination of positive theta (weight gain is unhealthy) and the fact that the true population has most people losing weight. The exposed population is overall healthier than a hypothetical unexposed population. Unless you make the assumption that theta = 0 (like we do) when exposure is better than the "ideal" scenario (in this case, when exposure is negative), it may not be possible or make sense to estimate proportion of outcomes attributable to exposure. It may be helpful to think of the PAF as comparing between an exposed distribution and an ideal distribution. Given the the RR function and distribution for this scenario, the ideal scenario would not be everyone at 0 BMI change so might not make sense to calculate PIF here.

However, while I have problems with the example, the point is taken that certain scenarios (high exposure effect, low mean, high variance) will lead to theoretical PAF of 1 (not good!) Edit1: This is a not necessarily a purely theoretical scenario either. Nuts in the US may actually hit these three points. Edit2: On second thought, since we make some assumptions ourselves (the TMRED is normally distributed and the RRs do not change after going beyond TMRED), we may have avoided this situation for nuts as well. That would explain why we didn't encounter near 1 PAFs for nuts. (P.S, Sorry for all these edits)

fcudhea commented 7 years ago

The nuts distribution in the US, based on NHANES data, is highly skewed (very low mean, very high variance), and, I believe (don't actually remember, don't quote me on this) has a relatively large effect on CVD outcomes (high reduction in CVD outcomes per unit increase in nuts, in other words "theta" is large). Basically, I thought this could possibly be a real world example where divergence is an issue.

However, for this particular case, we are not using strictly using RR(theta: X) = exp(thetaX), we are using RR(theta: X) = exp(theta(X-y)) if x <y , 1 if x > y. This may be the reason we avoided the divergence issue and did not observe near 1 PAFs in our analyses.

Let me know if things are not clear!

RodrigoZepeda commented 7 years ago

Just a small comment on the second point. Define Z = X - y and then RR(theta; Z) = exp(theta*Z) which is the case we are analyzing. Z is just a displacement of the X distribution. In particular:

E[RR(theta; Z)] = E[exp(theta*Z)] = E[exp(theta*(X-y))] = E[exp(theta*X)*exp(-theta*y)] = exp(-theta*y) * E[exp(theta*X)]

and so the divergence on X will imply a divergence on Z . This definition of Z also explains why we consider negative exposures. Say X represents BMI and y = 22 (for example, this article which I think you were using has RR(theta; Z) = exp(theta*(X-22))). Then, for someone with BMI of 21 their specific z = -1will be negative. That is why we allow for negative exposures. And, as our Gamma example shows, a Gamma distribution with negative exposure levels causes havoc in the PAF.

By definition of Z we are using a displaced Gamma distribution, thus the scenario where people would use a displaced Gamma for describing negative values is, I think, realistic (although it is hidden in the calculations).

This would be also the case of the nuts which, I think would be an awesome example!

fcudhea commented 7 years ago

Good point. I never looked at it from this angle. I had always thought the TMRED should be worked into the RR function but you are right, you could also work it into the distribution as well.

However, in this case, does the derived PAF formula for gamma distribution shown in this section still apply for a displaced gamma distribution? Is the inequality that breaks the gamma distribution still theta > mu/sigma^2? Or maybe (just a guess) it changes to something like (u+y)/sigma^2?

I see that a negative intake can make sense, but I still a think a negative PAF is not useful / nonsensical. If the exposure distribution is at lower risk healthier then the "ideal" distribution. does it really makes sense to calculate a PAF?

RodrigoZepeda commented 7 years ago

Following the example above, let µ be the mean of X and s^2 its variance. Then the PAF will have a divergent integral if theta > µ/s^2. Translating this to Z the PAF will have a divergent value if theta > (mean(Z) + y)/s^2 because µ = mean(Z) + y.

For the negative PAF, consider the study mentioned in my previous comment. For women, the Relative Risk for Oesophageal squamous cancer is 0·57 (0·47–0·69) (page 572, Figure 2). If you estimate the PAF considering the Relative Risk Function: RR(X;theta) = exp(theta*(X - 22)) where X is BMI and theta = ln(0.57) = -0.5621189 then the PAF will be negative.

My way of seeing it is as follows: RR < 1 implies that exposure to BMI actually prevents that kind of cancer RR< 1 implies PAF < 0`` and thus PAF < 0`means that the exposure was actually good for that thing.

Most exposures have this dual effect: say, a medicine might help cure a disease but also cause vomiting or nausea (side-effects); giving nuts to everyone might help nutrition while at the same time augmenting the cases of anaphylactic shock (from those who had an allergy), etc.

It might not be useful to estimate a PAF for something for which the exposure is a good thing. But I think that is a problem in the realm of ethics of science. Should one only estimate PAFs for something that is bad? (That sounds to me like cherry-picking but at the same time might be helpful in implementing some policy that might be "for the greater good"). Mathematically speaking, at least, a negative PAF makes sense and has a plausible real-life explanation; namely that the exposure was actually good and reducing exposure reduces the health cases by PAF %

fcudhea commented 7 years ago

Extreme apologies for very late response.

We also calculate PAFs for healthy exposure (healthy foods like fruits, etc...), but in such a way that we don't end up with negative PAFs. I'll try to explain what we're doing and see if it makes any sense to you.

Let's take a fruit as an example. Let's say the exposure is 100g/day and that the ideal intake is 300g/day . We define RR(x) as exp(betadelta) where beta is the increase in log relative risk per increase in exposure and delta is the distance between current intake (x) and ideal intake (y), really RR(x) = exp(beta(x-y)). Since fruits is healthy food, beta is negative. But x-y is negative as well since x < y, so RR(x) is still > 1, even when looking at healthy foods. In order to get a situation where RR(x) is < `1 for healthy food, we would need a situation where the current intake is greater than the "ideal" intake.

So, at least in my way of thinking about it, it's not so much that PAF of good exposure would be negative. It's strictly when the current exposure is healthier than the "ideal" exposure (where RR(x) is 0) where we should see negative PAFs. But by definition, the ideal exposure should be the healthiest, making negative PAFs a kind of oxymoron.

Also, not sure about your interpretation of negative PAF. Aren't PAFs not necessarily be bounded by -1? If you get a PAF of -3.5, what is the interpretation of that?

Again, sorry for the late response. Will try to respond to other threads before the new year!

RodrigoZepeda commented 7 years ago

Sorry for the terribly late response.

I agree that most real-life scenarios have positive PAF because that is what most researchers are interested in (at least in public policy). I agree with you in that

It's strictly when the current exposure is healthier than the "ideal" exposure (where RR(x) is ~0~ 1)

And most scenarios won't have a negative PAF. However, caution is advice. In your example it might be the case that beta has been estimated with CI [-0.12, 2] (so it's not "statistically significant" that the effect of fruit is beneficial). In such cases you can get a negative PAF just by considering beta's CI:

1 - 1/exp(-0.12) = -0.12749685157937574

On interpreting negative PAFs they are sort-of-the-same on how we interpret Relative Risks. For a "protector" we have 0 < RR < 1. But for a "damager"(?) RR is between 1 and infinity so it's not the same scale. A PAF of 0.42 states that by reducing exposure, 42% of current cases would be eliminated. A PAF of -0.55 states that by reducing exposure one could cause an additional 55% of cases. A PAF of -3.2 states that one could cause an additional 320% cases. Note that one can always increase any % of cases (you are not bounded by 100) but you can only reduce at most all cases (100%). Does that make any sense?