OHDSI / CohortIncidence

Contains the Java and R assets to perform Incidence calculations on a CDM
https://ohdsi.github.io/CohortIncidence/
6 stars 2 forks source link

calculating monthly incidence #16

Open chandryou opened 2 years ago

chandryou commented 2 years ago

@chrisknoll Is it possible to calculate monthly incidence, rather than annual incidence, by using CohortIncidence? If it is not, how can I modify this package to calculate monthly incidence?

chrisknoll commented 2 years ago

Can you define 'monthly incidence': do you mean: starting time at risk at the first day of a calendar month, use the given month as time at risk and calculate incidence based on number of people with observation during the month who have the outcome of interest?

louisahsmith commented 1 year ago

I am also interested in computing monthly incidence, by calendar time (to look at changes in incidence over time). Is this possible?

louisahsmith commented 1 year ago

Additional question as I dig into this: the annual incidence is not currently based on when the events occur but based on the year in which a person starts being at risk -- am I interpreting it correctly? Is there a way to compute incidence over time that reflects when the events are occurring?

chrisknoll commented 1 year ago

I am also interested in computing monthly incidence, by calendar time (to look at changes in incidence over time). Is this possible?

We can create a 'monthly calendar time cohort' by making people enter on the first of month and follow them until the end of the month, but the current results data model only stores the aggregated rates/proportions by age/gender/year. We'd need an additional column of 'month' in order for you to be able to see that level of detail. It's something we can think about, but that's the limitation right now.

Additional question as I dig into this: the annual incidence is not currently based on when the events occur but based on the year in which a person starts being at risk -- am I interpreting it correctly? Is there a way to compute incidence over time that reflects when the events are occurring?

No, that's not correct: the incidence is being calculated based on when the events occur: if someone in 2017 has an outcome in april, and another person has an outcome in July, we will calculate the rate based on the time at risk to the outcome, not the year of the outcome. In fact, the 'year' you see in the result is not the year the outcome occurred, but rather it is the year the time at risk started. In the above example, the person who had the outcome in april could have their TAR start in 2016 (say december of 2016) and the other started tar in 2017. The result would capture the TAR and outcome for the person who started in december 2016 in the 2016 bucket, and the person who started in 2017 in the 2017 bucket. We use the TAR to group instead of the outcome because TAR typically represents an intervention, and you would want to group people by the year the began their intervention and not the year they got the outcome.

You can design your cohorts so that the TAR year and outcome year will always be the same: define your cohorts so that the tar is completely contained in a single year and then your tar year and outcome year will always be the same year.

On the Annual Incidence vs. Monthly (aside from we can't capture monthly statistics because we don't have a 'by month' column): This can be done by creating a cohort based on observation periods, where you specify the start and end dates to be the first of each calendar year you are interested in, and then persist the person int he cohort up to 365 days.

Here is the example:

Cohort Entry Events
People enter the cohort when observing any of the following:

observation periods, a user defiend start date of January 1, 2015 and end date of January 1, 2015.

observation periods, a user defiend start date of January 1, 2016 and end date of January 1, 2016.

observation periods, a user defiend start date of January 1, 2017 and end date of January 1, 2017.

observation periods, a user defiend start date of January 1, 2018 and end date of January 1, 2018.

observation periods, a user defiend start date of January 1, 2019 and end date of January 1, 2019.

observation periods, a user defiend start date of January 1, 2020 and end date of January 1, 2020.

Cohort Exit
The cohort end date will be offset from index event's start date plus 365 days.

Some notes on this: when using 'user specified dates' in observation period criteria, you are overriding the cohort entry event from using the data in observation period with the date you specify. The additional constraint is that the person will only get into the cohort if their observation period spans the date you specify. Ie: a person who is in the data from 2010-2014 will not get into this cohort because they are not in on Jan 1, 2015-2020.

This example only will create spans of time starting on Jan 1 for the years 2015-2020. If you want to add more years, just duplicate the settings for one of the existing years.

How this will work is: a cohort will be created where the cohort episodes will all start on Jan 1 of the years 2015-2020, but only if the person's observation period was 'in the data' on Jan 1. If the person is in the data on Jan 1 of the given year, then the cohort episode for the person will extend up to 365d. I say 'up to' because someone could be in the data on Jan 1, but then leave the data in August of the same year, and therefore the tar should only go from Jan 1 - August.

This cohort is what I'm calling the "Calendar Cohort'.

Once you have the Calendar cohort, you can then apply CohortIncidence where your Target is your calendar cohort, and your Outcome is your outcome of interest (let's say GI bleeding). CI will the calculate rates as overall, by age, by gender, by year (and any combo of those) to give you your 'calendar rate' as in the rate calculated where the TAR starts Jan1 for all people in a given year.

Admittedly, this is a lot of setup to get your Calendar rate, and you are effectively making a copy of your OBSERVATION_PERIOD table. I'm now thinking that it might be a nice feature to have a special function in CohortIncidence for calculateAnnualIncidence and calculateMonthlyIncidence that would crate the TARs directly off the OBSERVATION_PERIOD table instead of making you build the cohort yourself.