OHDSI / FeatureExtraction

An R package for generating features (covariates) for a cohort using data in the Common Data Model.
http://ohdsi.github.io/FeatureExtraction/
61 stars 60 forks source link

Preventing calculation of covariates on index date #8

Closed ChristopheLambert closed 7 years ago

ChristopheLambert commented 8 years ago

I used createCovariateSettings() within Atlas to generate a cohort study. The desired behavior is to not compute covariates for propensity scoring on the index date. I assumed setting useCovariateDrugEraOverlap = FALSE would do that. However, when I run propensity scoring code, it gives errors about a high correlation between covariates(s) and treatment, listing the drugs that are my treatment and comparator drugs as the culprits. I specified a 365-day pre-treatment observation period, limiting events to the earliest event per person. My assumption is that I would have 365 days of no treatment, followed by treatment on the index date (perhaps this is wrong?). It would seem to me that the problem is that the treatment on the index date is what is picked up as the high correlation.

If I list my treatment and comparator drugs in the excluded concepts, the high correlations go away, but I'm concerned other variables such as Conditions and Observations on the index date are also getting into the propensity model.

Here is the documentation for useCovariateDrugEraOverlap:

useCovariateDrugEraOverlap: A boolean value (TRUE/FALSE) to determine if covariates will be created and used in models that look for presence/absence of drug era that overlaps the cohort index date. Only applicable if useCovariateDrugEra = TRUE.

If this is in error, I would question if useCovariateConditionEraOverlap works. Also, for symmetry, it does not appear that there is a way to exclude Observations, Procedures, or Measurements via analogous "Overlap" parameters for those covariates.

Below is the JSON code defining one of my cohorts:

{
  "ConceptSets": [
    {
      "id": 0,
      "name": "Bipolar Disorder",
      "expression": {
        "items": [
          {
            "concept": {
              "CONCEPT_ID": 436665,
              "CONCEPT_NAME": "Bipolar disorder",
              "STANDARD_CONCEPT": "S",
              "INVALID_REASON": "V",
              "CONCEPT_CODE": "13746004",
              "DOMAIN_ID": "Condition",
              "VOCABULARY_ID": "SNOMED",
              "CONCEPT_CLASS_ID": "Clinical Finding",
              "STANDARD_CONCEPT_CAPTION": "Standard",
              "INVALID_REASON_CAPTION": "Valid"
            },
            "includeDescendants": true
          }
        ]
      }
    },
    {
      "id": 1,
      "name": "Lithium",
      "expression": {
        "items": [
          {
            "concept": {
              "CONCEPT_ID": 751246,
              "CONCEPT_NAME": "Lithium Carbonate",
              "STANDARD_CONCEPT": "S",
              "INVALID_REASON": "V",
              "CONCEPT_CODE": "42351",
              "DOMAIN_ID": "Drug",
              "VOCABULARY_ID": "RxNorm",
              "CONCEPT_CLASS_ID": "Ingredient",
              "STANDARD_CONCEPT_CAPTION": "Standard",
              "INVALID_REASON_CAPTION": "Valid"
            },
            "includeDescendants": true
          }
        ]
      }
    },
    {
      "id": 2,
      "name": "Drug Lithium",
      "expression": {
        "items": [
          {
            "concept": {
              "CONCEPT_ID": 751246,
              "CONCEPT_NAME": "Lithium Carbonate",
              "STANDARD_CONCEPT": "S",
              "INVALID_REASON": "V",
              "CONCEPT_CODE": "42351",
              "DOMAIN_ID": "Drug",
              "VOCABULARY_ID": "RxNorm",
              "CONCEPT_CLASS_ID": "Ingredient",
              "STANDARD_CONCEPT_CAPTION": "Standard",
              "INVALID_REASON_CAPTION": "Valid"
            },
            "includeDescendants": true
          }
        ]
      }
    }
  ],
  "PrimaryCriteria": {
    "CriteriaList": [
      {
        "DrugEra": {
          "CodesetId": 2
        }
      }
    ],
    "ObservationWindow": {
      "PriorDays": "365",
      "PostDays": 0
    },
    "PrimaryCriteriaLimit": {
      "Type": "First"
    }
  },
  "AdditionalCriteria": {
    "Type": "ANY",
    "CriteriaList": [
      {
        "Criteria": {
          "ConditionOccurrence": {
            "CodesetId": 0
          }
        },
        "StartWindow": {
          "Start": {
            "Coeff": -1
          },
          "End": {
            "Coeff": 1
          }
        },
        "Occurrence": {
          "Type": 2,
          "Count": 2
        }
      }
    ],
    "DemographicCriteriaList": [],
    "Groups": []
  },
  "QualifiedLimit": {
    "Type": "All"
  },
  "ExpressionLimit": {
    "Type": "All"
  },
  "InclusionRules": [],
  "EndStrategy": {
    "CustomEra": {
      "DrugCodesetId": 2,
      "GapDays": "30",
      "Offset": 0
    }
  }
}
pbr6cornell commented 8 years ago

Correct, the features include the index date as part of their time at risk, and you do need to exclude the corvariates for drugs that you use to define your target/comparator group.

A potential future feature to add to the package would be to have the ability to further refine the 'time at risk' for covariates, so that users could choose whether they want to include the index date or not.

On Fri, Nov 4, 2016 at 10:29 AM, Christophe Lambert < notifications@github.com> wrote:

I used createCovariateSettings() within Atlas to generate a cohort study. The desired behavior is to not compute covariates for propensity scoring on the index date. I assumed setting useCovariateDrugEraOverlap = FALSE would do that. However, when I run propensity scoring code, it gives errors about a high correlation between covariates(s) and treatment, listing the drugs that are my treatment and comparator drugs as the culprits. I specified a 365-day pre-treatment observation period, limiting events to the earliest event per person. My assumption is that I would have 365 days of no treatment, followed by treatment on the index date (perhaps this is wrong?). It would seem to me that the problem is that the treatment on the index date is what is picked up as the high correlation.

If I list my treatment and comparator drugs in the excluded concepts, the high correlations go away, but I'm concerned other variables such as Conditions and Observations on the index date are also getting into the propensity model.

Here is the documentation for useCovariateDrugEraOverlap:

useCovariateDrugEraOverlap: A boolean value (TRUE/FALSE) to determine if covariates will be created and used in models that look for presence/absence of drug era that overlaps the cohort index date. Only applicable if useCovariateDrugEra = TRUE.

If this is in error, I would question if useCovariateConditionEraOverlap works. Also, for symmetry, it does not appear that there is a way to exclude Observations, Procedures, or Measurements via analogous "Overlap" parameters for those covariates.

Below is the JSON code defining one of my cohorts:

{ "ConceptSets": [ { "id": 0, "name": "Bipolar Disorder", "expression": { "items": [ { "concept": { "CONCEPT_ID": 436665, "CONCEPT_NAME": "Bipolar disorder", "STANDARD_CONCEPT": "S", "INVALID_REASON": "V", "CONCEPT_CODE": "13746004", "DOMAIN_ID": "Condition", "VOCABULARY_ID": "SNOMED", "CONCEPT_CLASS_ID": "Clinical Finding", "STANDARD_CONCEPT_CAPTION": "Standard", "INVALID_REASON_CAPTION": "Valid" }, "includeDescendants": true } ] } }, { "id": 1, "name": "Lithium", "expression": { "items": [ { "concept": { "CONCEPT_ID": 751246, "CONCEPT_NAME": "Lithium Carbonate", "STANDARD_CONCEPT": "S", "INVALID_REASON": "V", "CONCEPT_CODE": "42351", "DOMAIN_ID": "Drug", "VOCABULARY_ID": "RxNorm", "CONCEPT_CLASS_ID": "Ingredient", "STANDARD_CONCEPT_CAPTION": "Standard", "INVALID_REASON_CAPTION": "Valid" }, "includeDescendants": true } ] } }, { "id": 2, "name": "Drug Lithium", "expression": { "items": [ { "concept": { "CONCEPT_ID": 751246, "CONCEPT_NAME": "Lithium Carbonate", "STANDARD_CONCEPT": "S", "INVALID_REASON": "V", "CONCEPT_CODE": "42351", "DOMAIN_ID": "Drug", "VOCABULARY_ID": "RxNorm", "CONCEPT_CLASS_ID": "Ingredient", "STANDARD_CONCEPT_CAPTION": "Standard", "INVALID_REASON_CAPTION": "Valid" }, "includeDescendants": true } ] } } ], "PrimaryCriteria": { "CriteriaList": [ { "DrugEra": { "CodesetId": 2 } } ], "ObservationWindow": { "PriorDays": "365", "PostDays": 0 }, "PrimaryCriteriaLimit": { "Type": "First" } }, "AdditionalCriteria": { "Type": "ANY", "CriteriaList": [ { "Criteria": { "ConditionOccurrence": { "CodesetId": 0 } }, "StartWindow": { "Start": { "Coeff": -1 }, "End": { "Coeff": 1 } }, "Occurrence": { "Type": 2, "Count": 2 } } ], "DemographicCriteriaList": [], "Groups": [] }, "QualifiedLimit": { "Type": "All" }, "ExpressionLimit": { "Type": "All" }, "InclusionRules": [], "EndStrategy": { "CustomEra": { "DrugCodesetId": 2, "GapDays": "30", "Offset": 0 } } }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OHDSI/FeatureExtraction/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsrGvo2bDrVIaBORkcG87GFlhEJhPNyks5q60FXgaJpZM4KpmTp .

ChristopheLambert commented 8 years ago

I thought that the parameters useCovariateDrugEraOverlap and useCovariateConditionEraOverlap were used for just this purpose -- if TRUE, use the index date, if FALSE, don't use the index date. Do these parameters do something else?

ChristopheLambert commented 8 years ago

Or is my assumption wrong that I should get 365 days of no treatment? If I am wrong, is there a way to specify that the treatment itself must not happen before the index date?

pbr6cornell commented 8 years ago

These parameters do not control whether you include index date.

Instead, these are options for for creating a new set of features: useCovariateDrugEraOverlap creates binary variables for whether each drug ingredient that assesses whether that drug is observed to be concomitant, meaning its era straggles the index date.

On Fri, Nov 4, 2016 at 10:40 AM, Christophe Lambert < notifications@github.com> wrote:

I thought that the parameters useCovariateDrugEraOverlap and useCovariateConditionEraOverlap were used for just this purpose -- if TRUE, use the index date, if FALSE, don't use the index date. Do these parameters do something else?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/OHDSI/FeatureExtraction/issues/8#issuecomment-258449163, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsrGpw-tWT45x2wEd88E7sLmmlCkJJbks5q60PkgaJpZM4KpmTp .

schuemie commented 7 years ago

Solved in commits 119ca34ade7316635b723680ee5c3f70235271bc and 17740ffec44b22aae59322079cc2fa84b671d12a