Line up IRB exemption for family history work

Question (below) sent to Brenda Belcher on 7/18. On 7/20, Brenda replied that she is working on a reply.

Dear Brenda,

My name is Melissa Cline, and I work for David Haussler in the UC Santa Cruz Genomics Institute. I got your name from Isabel Bjork. I'd like to talk to you regarding a project for which we're seeking IRB approval / exemption / blessing of some appropriate form.

Here is the background on my project. I'm responsible for the UC Santa Cruz arm of the BRCA Exchange (http://brcaexchange.org/ http://brcaexchange.org/), which is one of the "driver projects" for the Global Alliance for Genomics and Health. The goal of this project is to characterize and share information on genetic variation in the BRCA1 and BRCA2 genes, collectively called "BRCA", which are home to some of the most important genetic variants in medicine. Woman with pathogenic (unlucky) variants in their BRCA genes face a greatly-elevated risk of breast and ovarian cancer, with an estimated 80% chance of getting cancer by the age of 70. When Angelia Jolie chose to have a double mastectomy after a genetic test confirmed that she was predisposed for breast cancer, that was a BRCA test. BRCA is implicated in many other types of cancer as well - for instance, BRCA variation can put men at greater risk of developing prostate cancer. And on the positive side, if cancer patients have pathogenic BRCA variants, there's a class of anti-cancer drugs called PARP inhibitors that turn out to be very effective against their cancers. But sadly, most medical researchers and genetic counselors are working with incomplete BRCA information, for a couple reasons. First, our knowledge on BRCA variation has been scattered across different silos around the world, for a variety of historical and technical reasons. The BRCA Exchange and Global Alliance have been taking steps to fix that, and now we host the world's largest public repository of BRCA variation data. Second, many BRCA variants are not yet classified. This means that most women who get tested will get the (agonizing) result that they have one or more "Variant of Unknown Significance" (VUS). The BRCA Exchange is working to fix that by providing our colleagues who curate genetic variants with the information that they need to curate more BRCA variants, to reduce the number of VUSs in those test reports. That information includes family history data, which relates patients' genetic variants to the number of their relatives who've faced cancer. That brings me to my immediate project.

The BRCA Exchange (a.k.a. BRCA Challenge) is really a collaboration of organizations around the world, all of which are engaged in better characterizing BRCA genetic variation. This includes a number of genetic testing companies. They each have an abundance of family history data, from patients who've undergone their genetic tests (they have patients' family history data, as well as genetic testing results, because the family history data is generally required by insurance companies). They're interested in sharing their family history data, because they've recognized that sharing this information benefits everyone: larger amounts of data lead to better classifications. They've asked the BRCA Exchange to facilitate this sharing. The ENIGMA consortium, a variant classification consortium which is itself a member of the BRCA Exchange, has graciously offered to host and manage the data. The exact details, including the exact data to be shared, are described in the document I've attached. At UC Santa Cruz, we're driving this effort, together with some other parties. And in particular, we're looking to run this sharing of family history information past IRB.

Here's our first question: is this data even PHI? We suspect it's not, for the following reasons:

all names and personal identifiers are stripped out.

the pedigree information is restricted to numbers of family members who've had cancer, and doesn't include total number of family members. This would not be sufficient to identify individuals in a U.S. Census database, for example.

the genetic information is limited to the variants observed in four genes (the two BRCA genes, plus two other closely-related genes). This wouldn't be sufficient to identify people without a massive false positive rate.

the personal information is coarsely-grained, with ethnicity lumped into five very general categories (European/Caucasian; African American; Asian; Hispanic; Jewish) and age limited to year of birth. Our variant curation colleagues say that if this raises any eyebrows, decade of birth is probably sufficient for their needs.

breast and ovarian cancer are both common diseases. Sadly.

Where do you think this all leaves us, and what should our next steps be?

Thanks in advance for your thoughts!

Reply on July 25, 2016:

Brenda Belcher replied: Hi Melissa, Thank you for your question and thoughtful consideration of the issues. I have some information and some questions in order to determine whether any IRB review is required.

First, will the genetic data be coded? For the IRB’s purposes, coded means that:

identifying information (such as name or social security number) has been replaced with a number, letter, symbol, or combination of these; AND
a key to decipher the code exists, enabling linkage of the identifying information to the private information.

No Codes If data being provided does NOT contain codes – just de-identified data that was collected for other purposes (e.g. genetic testing) – and there would be no way to re-link the data with the subjects, then IRB review would not be necessary.

Note that HIPAA considers the following to be identifiable:

Names
Geographic subdivisions smaller than a state (except the first three digits of a zip code if the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000)
All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, and date of death and all ages over 89 and all elements of dates (including year) indicative of such age (except that such ages and elements may be aggregated into a single category of age 90 or older)
Telephone numbers
Fax numbers
Electronic mail addresses
Social security numbers
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers, including license plate numbers
Device identifiers and serial numbers
Web Universal Resource Locators (URLs)
Internet Protocol (IP) address numbers
Biometric identifiers, including finger and voice prints
Full face photographic images and any comparable images
Any other unique identifying number, characteristic, or code (excluding a random identifier code for the subject that is not related to or derived from any existing identifier).

Based on the above, year of birth would not be considered an identifier, so you wouldn’t need to limit your data to decade of birth.

Coded Data Alternatively, if the data is coded, research involving only coded private information or specimens is not considered to involve human subjects as defined under 45 CFR 46.102(f) if the following conditions are BOTH met:

the private information or specimens were not collected specifically for the currently proposed research project through an interaction or intervention with living individuals; AND
the investigator(s) cannot readily ascertain the identity of the individual(s) to whom the coded private information or specimens pertain because, for example:

a. the investigators and the holder of the key enter into an agreement prohibiting the release of the key to the investigators under any circumstances, until the individuals are deceased (note that the HHS regulations do not require the IRB to review and approve this agreement);

there are IRB-approved written policies and operating procedures for a repository or data management center that prohibit the release of the key to the investigators under any circumstances, until the individuals are deceased; or c. there are other legal requirements prohibiting the release of the key to the investigators, until the individuals are deceased.

It looks like the data you described does not have any information that HIPAA or the IRB would consider identifiable. Please let us know whether the data will be coded and we can go from there.

Best regards,

Brenda Belcher

Analyst Office of Research Compliance Administration (ORCA) University of California Santa Cruz (831) 459-1473 JUL 20, 2016 | 02:51PM PDT Brenda Belcher replied: Hi Melissa, Thank you for your email. I am working on some answers for you and I will get back to you shortly.

Best regards,

Brenda Belcher

Analyst Office of Research Compliance Administration (ORCA) University of California Santa Cruz (831) 459-1473

Reply to Brenda on Aug 2, 2016:

I've checked into whether or not the data would be coded. Here is the response I received, from the person who will be managing the data submissions.

"My thought was that contributing centers would use an assigned code for each tested individual (this could be their internal ID number for example, or a randomly generated code) that would correspond to each data record sent. I would not have any way of knowing to which individual the code belonged, although if a question arose, I could ask the center/lab for a clarification on ID xxxxxxx and they could then locate that individual’s data record in their own database. However, it would be specified that the information relating this code to individual patient identifiers would never leave the laboratories’ secure data facility. The combined data set that would be created at the data coordinating center and would be distributed back to each of the contributing centers, would have even these coded removed."

So yes, the incoming data will be coded, although only the contributing center will have access to the code. The bottom line is that there is a path back from the incoming data to the individual, even if that path can only be taken by those at the contributing centers. The outgoing data (the data shared with the members of the consortium) will not be coded. And from what I understand from you, it will also not be identifiable, so long as we restrict ourselves to data on individuals who are not yet 89 years old.

Given that the data appears to be coded, it appears to me that we can claim that it does not involve human subjects under 45 CFR 46.102(f).

the private information or specimens were not collected specifically for the currently proposed research project through an interaction or intervention with living individuals; AND

Yes, this is true. The information and specimens were collected for other purposes.

the investigator(s) cannot readily ascertain the identity of the individual(s) to whom the coded private information or specimens pertain because, for example:

If we establish a formal, IRB-approved written policy stating that the contributing centers will not release their keys under any circumstances (at least until the individuals are deceased), then this will also be true.

Given those points, what are our next steps?

Update as of 8/3:

Dear Brenda,

Unfortunately in this context, we are the prime recipients of a federal grant. We are doing this work as part of a Big Data To Knowledge grant from the National Human Genome Research Institute (NHGRI), which is part of the NIH. We are the prime recipients of this grant.

Given that, what are our next steps now?

Thank you,

Melissa

On Wed, Aug 3, 2016 at 2:22 PM, Support support@orca-ucsc.desk-mail.com wrote: Type your response ABOVE THIS LINE to reply Melissa Cline Subject: BRCA Exchange project AUG 03, 2016 | 02:22PM PDT Brenda Belcher replied: Hi Melissa, Thank you so much for that explanation. I have just one more question – are you the prime recipient of a federal grant? That is one circumstance in which review might be required even if you are not engaged in human subjects research.

Please let me know. And if you are NOT receiving a federal grant, then the activity you describe is not human subjects research and no IRB review is required.

Best regards,

Brenda

Update on 8/5/16

Dear Brenda,

No one is doing human subject research for this project per se. We're not involved in human subject research ourselves, and all of our work on this project (as well as on the larger grant that funds this project) is in silico. Our collaborators in this effort include commercial genetic testing labs, such as Invitae, Counsyl and Color Genomics. Their business model is to perform genetic tests on individuals (with the tests generally ordered through doctors' offices), although this work involves doing research to refine their testing practices. So they are doing human subject research as part of their data collection, but these data are not collected for this project per se. We also have collaborators who are curating genetic variation data, but they perform their work in silico and do not work on human subjects.

Where does that leave us?

Thanks!

Melissa

On Fri, Aug 5, 2016 at 8:38 AM, Support support@orca-ucsc.desk-mail.com wrote: Type your response ABOVE THIS LINE to reply Melissa Cline Subject: BRCA Exchange project AUG 05, 2016 | 08:38AM PDT Brenda Belcher replied: Hi Melissa, In that case, who is conducting human subjects research for this project? We have established that your team is only handling de-identified data that does not qualify, but is there someone among the collaborators who, in the course of conducting research, obtains either a) data through intervention or interaction with the individual, OR b) identifiable private information?

I am looking into the options. Knowing the above will help.

Best regards,

Brenda

Analyst Office of Research Compliance Administration (ORCA) University of California Santa Cruz (831) 459-1473

Aug 15 Update:

I'm also learning about this process as we go - thank you for bearing with me!

Our project manager confirmed that our NIH grant requires us to share data via dbGaP or other repository. I believe it's a standard requirement these days.

Here's an outline of the process. The affiliations are a bit confusing, because there are some different, overlapping consortia involved. We're part of the BRCA Challenge consortium of the GA4GH (Global Alliance for Genomics and Health), which is also known as the BRCA Exchange. Our goal is to gather together and share information on BRCA genetic variation, and how that variation relates to risk of cancer. Our consortium includes a number of genetic testing companies, most of which are big proponents of public data sharing. Our consortium also includes members of the ENIGMA consortium, the goal of which is to curate genetic variants, i.e. to determine which variants are pathogenic, meaning that they lead to increased risk of cancer. The ENIGMA consortium has spearheaded this effort to aggregate and share this family history data, which they use as part of their process for assessing genetic variants. Besides the ENIGMA group, many members of BRCA Exchange are also interested in this data. In particular, the genetic testing companies are interested in sharing their respective family history data, because the result would be each company working with a larger (shared) dataset, which would benefit everyone.

Here's the way the workflow would look. A patient would get a genetic test, typically ordered by a doctor in conjunction with other treatment. The doctor's office would order the test and would ship the patient's sample to a genetic testing company. The test order would include some clinical data and some background data, including a history of cancer in the patient's family (that's a standard insurance requirement). The patient would sign a consent, and while the consent varies by company, they all include some provision for the company to use the patient's data for research. I'm not sure where exactly the sample and data become de-identified, but it happens by the time the data reaches the arm of the genetic testing company that performs the test and records the clinical and family history data. The patient's test results are returned to the doctor, and the genetic testing company retains a copy of the test result and the patient's de-identified data. The code to re-identify the data is stored separately, typically in another arm of the company.

When the company submits family history data to this project, a representative of the company will share with the project a batch of data for several patients. The contents of the data shared are what we've discussed before: the patient's genetic variants in the BRCA1 and BRCA2 genes, age (in year), ethnicity (with a coarse breakdown), cancer history, number of family members diagnosed with cancer, etc. The data for each patient will be labeled with a coded identifier, where someone in the genetic testing company has the code to indicate what person corresponds to what identifier, but those codes are not shared with the BRCA Exchange. In most cases, even the company representative who shares the data will not have access to the codes. The company representative will share the data specifically with the BRCA Exchange data coordination center (DCC). An individual from the ENIGMA project has volunteered to serve as the DCC. He will aggregate the data, and share with the project per-variant data. This per-variant data will report the number of individuals for whom data was available, and the family history data for those individuals. Note that this will not include any patient IDs.

Does this answer your question?

Best,

Melissa

On Fri, Aug 5, 2016 at 11:58 AM, Support support@orca-ucsc.desk-mail.com wrote: Type your response ABOVE THIS LINE to reply Melissa Cline Subject: BRCA Exchange project AUG 05, 2016 | 11:58AM PDT Brenda Belcher replied: Hi Melissa, I am learning about this process as we go, so thank you for your patience. Our next question is whether your NIH grant is covered by NIH’s genetic data sharing policy, requiring you to share genetic data in dbGAP or elsewhere. If so, there is a form that needs to be completed called an Institutional Certification. I can provide that form.

Otherwise, if the grant does not include a data sharing requirement, and as you said, the data is being collected for other purposes independent of this research, no IRB review or special forms would be needed. We just need confirmation from you that this is the case.

Finally, it would be helpful to make sure we understand the process correctly. Do I understand correctly that the data is being collected for clinical purposes and then uploaded to the BRCA Exchange by the genetic testing companies? And your team would like access to the de-identified data from the BRCA Exchange (is it already de-identified in BRCA database or de-identified on the way out?) for analysis? Please correct whatever parts I have wrong.

(Update 9/2/16) Dear Brenda,

Before I say anything else, I do want to thank you for all the time you've been putting into this case! I notice and appreciate the attention you've been giving to the details and nuances. Thank you for your continued effort!

With all of that said, I'm worried. I'm hearing about lots of safeguards that I'm not sure we can meet, and I want to verify that we're looking at the proper safeguards for this question. In particular, I'm concerned that I might not have explained the situation clearly, and that we might be getting confounded by the IRB making a sensible decision based on an inaccurate picture, given my poor explanation. So let me try to explain this one more time. And please bear with me :-).

We're seeking some form of IRB approval or exemption for a data-sharing consortium, in which the data to be shared is a list of genetic variants (for four genes) observed in de-identified individuals, together with "family history data" that describes the number of individuals from their family with cancer, and non-PHI clinical data (age by year of birth, gender, ethnicity in one of five broad categories, cancer history by year of diagnosis). It sounds like we all agree that the data is non-PHI:

the patients are de-identified (coded, specifically), and the consortium does not access to the keys to indicate their identify from the sample IDs
while the data is coded (someone, somewhere knows who those patients actually are), the consortium will never see the decoding information
the clinical data itself is non-PHI
the family history data, which reports numbers of relatives with cancer but not total number of relatives, is non-PHI. This is following the logic that one could not use this information to infer the identity of these patients from a census database.
while an individual could be identified from his or her complete genetic information, an individual cannot be identified uniquely by the list of genetic variants observed in four genes.
the diseases that we're looking at are breast cancer, ovarian cancer, or cancer of any sort. None of these are rare diseases (sadly). In short, there is zero chance that anyone in the consortium could identify any of the patients from the data.

The data comes from genetic testing companies including Invitae, Counsyl and Color Genomics, and consists of test results and associated data for patients who've undergone genetic testing with the company's test panel. A few key facts are:

these data were collected in the past, for reasons separate from the work we're proposing to do.
each of the patients formally consented to testing
here is something I didn't know before: each genetic testing company has its own IRB approval to share these data with the consortium.

Finally, the purpose of this work is research. Specifically, the consortium is seeking to research whether genetic variants of unknown significance might put someone at risk of cancer, given family histories of cancer in individuals with and without those variants.

Here's what exactly we're looking for. We're looking for IRB approval or waiver of approval to work together as a consortium on this research question: inferring which genetic variants lead to increased risk of cancer, given data which is non-PHI, consented, and IRB-approved for sharing with the consortium.

As I mentioned earlier, we're concerned because we cannot provide everything the IRB is asking for. Specifically, they asked to see the consent forms "to be used at each site" (these consent forms were actually signed in the past). The genetic testing companies tell us that there are cases where the patient signs their own consent form, and there are cases where the patient signs the consent form for the hospital or other medical institution. So, they cannot produce all the consent forms that the patients have signed. But, each patient has signed a consent form, and each testing company has IRB approval to share the de-identified data with the consortium.

I've also attached the research proposal, for further information.

Given all of that, have we been asking the appropriate questions?

I'll be out of town for the next few weeks and largely away from my email, but some of the others CCed could help move this conversation along while I'm away.

Many thanks, once again!

Melissa

On Tue, Aug 30, 2016 at 1:30 PM, Support support@orca-ucsc.desk-mail.com wrote: Hi Melissa, Yes, as a matter of fact, I was just getting ready to contact you. Since your project is covered by NIH’s Genomic Data Sharing policy, the UCSC IRB will need to verify that the consent provided by subjects is appropriate for sharing the data broadly as described in the policy. Therefore, please provide a copy of the consent form to be used at each of the sites where data is collected.

Please let us know if you intend to use any data that was already collected, and if so, whether it was collected prior to 1/25/15 or after. The NIH expectations are different for consent forms implemented before the data sharing policy went into effect.

Please also provide a copy of your grant proposal. The IRB will use this to determine that your data storage procedures are consistent with NIH’s requirements for the purposes of completing the necessary Certification.

Once the IRB has reviewed the consent documents and your proposal, and resolved any questions that may come up, UCSC will prepare and submit an Institutional Certification to NIH.

We have confirmed that the above procedure is necessary regardless of whether the research activity constitutes “human subjects research” for the IRB’s purposes.

Thank you for your continued effort on this.

Best regards, Brenda

Analyst Office of Research Compliance Administration (ORCA) University of California Santa Cruz (831) 459-1473 [[c4782c69961f97b7624b104b6efec32a62707414-751969805]]

BRCAChallenge / brca-exchange

Line up IRB exemption for family history work #8