Suggest removing Chi Square corrected P value

karafecho commented 2 months ago

This issue is to suggest that we remove chi_squared_p_corrected from ICEES output. The correction is set to 1, meaning no correction is applied, so the chi_squared_p_corrected always equals chi_squared_p. I don't think a correction is helpful anyway, as ICEES is intended to support exploratory analysis. Moreover, if we applied a Bonferroni correction, for example, to certain endpoints like the "Association to All Features" endpoint would overcorrect, leaving most/all associations as not significant.

For example, this query:

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A1/associations_to_all_features' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '{
  "feature": {
    "TotalEDInpatientVisits": {
      "operator": "=",
      "value": "0"
    }
  },
  "maximum_p_value": 1,
  "correction": {
    "method": "bonferroni"
  }
}'

Yields associations such as:

karafecho commented 2 months ago

I'll add that the Bonferroni correction available at the "Association to All Features" and "Association to All Features2" endpoints behaves as a filter, not an actual correction. This issue and the above one are known, and I believe there may be an open ticket, but we probably should resolve the issues.

For example, this query:


curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A6/associations_to_all_features' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '{
  "feature": {
    "TotalEDInpatientVisits": {
      "operator": "=",
      "value": "Female"
    }
  },
  "maximum_p_value": 0.5,
  "correction": {
    "method": "bonferroni"
  }
}

Yields only associations with P values <0.5, but chi_squared_p_corrected equals chi_squared_p in the output, for example:

hyi commented 2 months ago

I took a look at the code and it only apply Bonferroni correction if it is specified in the request. So if no correction is desired, the correction method just needs to be removed from the request. I think that should work or I have missed something?

karafecho commented 2 months ago

I don't think the Bonferroni correction is being applied correctly. In the example I posted above, the uncorrected and corrected Chi Square P values are the same, even though a correction is in the request. I think the code is applying the correction as a filter, not a correction per se.

hyi commented 2 months ago

@karafecho I looked into the code and believe the main issue with applying the Bonferroni correction is the default alpha or significance level input. This alpha value is set to the default 1 if not specified in input, which leads to what you are seeing here since alpha input is not specified. It seems to be common practice to set alpha input to 0.05, so perhaps we should update code to set the default value to be 0.05 rather than the current 1 so corrected values make more sense?

karafecho commented 2 months ago

Okay, so in the query below, max_p_value simply acts as a filter for what associations are returned, and correction is set to 1 as default. As I now recall, Hao and I were planning to let the user set the correction (and also choose the type of correction). Is that possible, sticking with Bonferroni only for now? (I think the code supports multiple correction factors, but I don't think we necessarily need to support all of them.)

curl -X 'POST' \
  'https://icees-pcd.renci.org/patient/cohort/COHORT%3A4/associations_to_all_features' \
  -H 'accept: text/tabular' \
  -H 'Content-Type: application/json' \
  -d '{
  "feature": {
    "Sex2": {
      "operator": "=",
      "value": "Female"
    }
  },
  "maximum_p_value": 0.05,
  "correction": {
    "method": "bonferroni"
  }
}'

hyi commented 2 months ago

Yes, it supports multiple correction methods. It uses multpletests() from statsmodels library as shown here and it supports the following methods:

bonferroni : one-step correction

sidak : one-step correction

holm-sidak : step down method using Sidak adjustments

holm : step-down method using Bonferroni adjustments

simes-hochberg : step-up method (independent)

hommel : closed method based on Simes tests (non-negative)

fdr_bh : Benjamini/Hochberg (non-negative)

fdr_by : Benjamini/Yekutieli (negative)

fdr_tsbh : two stage fdr correction (non-negative)

fdr_tsbky : two stage fdr correction (non-negative)

You can update your query above to replace "correction" with the following:

"correction": {
    "method": "bonferroni",
    "alpha": 0.05
  }

which I think should return results that make much more sense.

karafecho commented 2 months ago

Well, I changed correction as suggested ran the query with and without maximum_p_value, but no luck: chi_squared_p and chi_squared_p_corrected are the same.

hyi commented 2 months ago

@karafecho You are correct. I looked into code in multitest() in statsmodels library and found the p-value correction using bonferroni method is not affected by alpha input, which is also verified by the comment in the multitest() code: Except for 'fdr_twostage', the p-value correction is independent of the alpha specified as argument. So my previous comment about alpha input is wrong. In looking into the code, I believe the reason we get corrected p-value being the same as input p-value is that we only pass in one p-value indicating it is only one test, and the p-value correction is only computed for multiple tests. If we only has one p-value, it looks like there is no point in correcting p-value.

karafecho commented 2 months ago

Oh, that makes total sense, although I'm not sure why we would bother supporting any corrections if we only pass in one p-value indicating one association test. As you note, then there's really nothing to correct. Seems weird, but anyway, it sounds like my original suggestion is the best solution for now, which is to simply remove correction. I'm guessing that you can just update the API example, so that it's not misleading.

hyi commented 2 months ago

@karafecho Sounds good. Will do.

karafecho commented 2 months ago

I checked the PCD endpoint and confirmed that the Bonferroni correction has been removed. As such, I'm closing this ticket.

ExposuresProvider / icees-api

Suggest removing Chi Square corrected P value #319