The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
94 stars 37 forks source link

terms related small variant and copy number variation #353

Open rbalakri opened 8 years ago

rbalakri commented 8 years ago

Hi,

We would like to have the following new terms: 1) copy neutral loss of heterozygosity (synonym CNLOH) 2) amplification (synonym: AMP) 3) amplification with loss of heterozygosity (synonym: amplification_LOH) 4) Delins- deletion of reference sequence plus insertion of other sequence (synonym: deletion/insertion, DELINS)

Please let me know if you need more information.

Rama

thefferon commented 8 years ago

dbVar would like some of these for structural variation as well - specifically #1 and #4.

@rbalakri , we should ensure the definitions match our combined requirements. Did you have any suggested definitions? If not I will propose them for #1 and #4 – I am not clear on the specifics of the others.

trutane commented 8 years ago

Hi @thefferon, #2 and #3 derive from COSMIC. Here's a proposal :

For CNLOH, it should be flagged with "is_a: UPD" and this xref. Here's a proposed definition:

For DELINS, here's an example from dbSNP. Proposed definition:

It occurs to me that DELINS is actually a generalization of SNV. So a SNV is a class of DELINS that restricts the length of the deletion & insertion to 1. Make sense?

Steve

keilbeck commented 8 years ago

Hi all I think we need to figure out the definitions and parents of these terms.

1) copy neutral loss of heterozygosity (synonym CNLOH). How does this related to UPD? While I agree with the general definition here from Steve, "a sequence alteration in which there is a loss of heterozygosity of a given region with no change in the copy number of that region." I think it is a bit vague. Is the term a parent of UPD or a synonym?

2) amplification (synonym: AMP) Are you in agreement with steve? 5 seems a bit arbitrary. We have the term feature_amplification that seems to be a general term that fits your need. It has the definition A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an amplification of sequence, is greater than the extent of the underlying genomic features. This was proposed by the EBI team. Do you think your term could be a synonym, or do you think you need to have a more specific child of this term?

3) amplification with loss of heterozygosity (synonym: amplification_LOH) This seems to be a subtype of feature_amplification. We could add a new term but need a definition. A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an amplification of sequence, is greater than the extent of the underlying genomic features and all sequence polymorphisms in the region are homozygous or hemizygous.

4) Delins- deletion of reference sequence plus insertion of other sequence (synonym: deletion/insertion, DELINS) I am confused about how this is different from an indel - which already exists. I do not think that the definition Steve put forward addresses the issue where the length is different - otherwise it would be a substitution or MNP. I suggest here that we add a synonym. Please correct me if I am missing something.

--Karen

thefferon commented 8 years ago

delins

@keilbeck , you are right, this is the same as the existing 'indel'. However, the broader community frequently uses "indel" as a generic catch-all for small insertions and deletions (i.e. any insertion or deletion event, typically in the 1bp-10bp range), and I don't think we are going to change their collective mind. I recommend changing existing 'indel' to (or obsoleting and replacing it with) 'delins' or 'deletion-insertion'. I suggest the following edit to Steve's definition: "a sequence alteration in which one or more contiguous nucleotides have been excised and replaced with a different sequence of nucleotides." Failing to specify anything in the definition with respect to the lengths of replaced or replacing sequences achieves a needed ambiguity / agnosticism.

– Tim

thefferon commented 8 years ago
1) copy-neutral loss of heterozygosity (synonym: CNLOH)
Discussion
Proposed definition

"a somatic change in which a region of a chromosome is deleted and replaced with homologous sequence from the sister chromatid, resulting in homozygosity throughout the region but no net change in copy number"

Proposed placement

Suggest adding this as either of the following:

  1. a child of "indel" (which I hope will be changed to "deletion-insertion" for reasons described above); Reason: it is the deletion of a region followed by an insertion (of sorts) – the homologous region from the sister chromatid. OR:
  2. a child of "MNV" ("a multiple nucleotide variant (substitution) in which the inserted sequence is the same length as the replaced sequence"). Personally, I don't think SO terms should be defined based on the length of the resulting allele compared to the original – but that is another issue.
2) amplification (synonym: AMP)
Discussion

I like Karen's proposed solution (making this term a synonym of feature_amplification), if Steve agrees.

No new definition
3) amplification with loss of heterozygosity (synonym: amplification_LOH)
Discussion
Proposed definition

"an amplification (SO:000...) [or _featureamplification] in which all sequence variants in the region are homozygous" [I left out "hemizygous" – because doesn't that imply there is only one copy, period?]

4) Delins- deletion of reference sequence plus insertion of other sequence (synonym: deletion/insertion, DELINS)
Discussion
Proposed definition

"a sequence alteration in which one or more contiguous nucleotides are excised and replaced with a different sequence of nucleotides"

Proposed placement

Replace existing indel with deletion-insertion (or delins)

trutane commented 8 years ago

@thefferon: I like your proposed definitions. Here are some further thoughts to continue the discussion:

Regarding DELINS, I also agree that it is agnostic about sequence length. My point was that an SNV is equivalent to a DELINS with an additional constraint on sequence lengths=1.

CNLOH:

Good point about CNLOH being a somatic event vs UPD=inherited. The existing definition of UPD in SO is clearly germline-oriented (_"a sequencealteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent.").

I'm fine keeping "somatic" in the definition of CNLOH. However, another possibility worth considering is to have a generic CNLOH term that does not specify somatic or germline. Such a term could be useful for cases where the somatic or germline status of the event is not known. The generic term could also serve as parent term to both UPD and CNLOH. Then we would add a term for "somatic_CNLOH" to explicitly indicate its somatic origin. This arrangement would link CNLOH and UPD which are mechanistically distinct events, but have structural & functional similarities.

Amplification:

Adding "amplification" as a synonym for "feature_amplification" works for me, but I think the definition needs some work, as noted below. I'd also like to go on record as being uncomfortable with specifying any specific # of copies required to qualify for an amplification.

At the risk of opening Pandora's box ;), here are my issues regarding feature_amplification:

1) "amplification" appears in the definition of the term: "A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an amplification of sequence, is greater than the extent of the underlying genomic features." Ideally, we should avoid using "amplification" in its definition. As it stands, I think one could reasonably mis-interpret it to mean a homopolymer expansion -- another useful concept to add to SO, btw (as a term or synonym).

Perhaps we could substitute "a copy number expansion" instead of "an amplification of sequence." Also, I think ploidy (or chromosome count) relative to a reference sequence would be useful here as well. Here's a proposal for the definition of feature_amplication:

"A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an increase in the number of copies of the sequence, is greater than what is expected based on the normal chromosome complement and a reference genome sequence."

2) Amplification should have some relationship to duplication and copy_number_gain. Currently, they are unrelated in SO. The relationships between these terms are:

There's a somatic vs germline connection here as well. Somatic CNVs are usually called "CNAs" (copy number alterations); "CNV" is used for heritable variants, but there's probably some mixing and matching of this terminology in the cancer vs mendelian communities.

I'm not sure the ideal resolution here. One proposal is to have feature_amplification as the root term with duplication and copy_number_gain as child terms. Then, they would both derive from sequence_variant, which seems right, IMO. Copy_number_amplification should be added as synonym of copy_number_gain, with possible note about the somatic connection.

Insertion should not appear in this family of copy number-related terms (many insertions actually derive from mobile elements inserts that haven't been sequenced). Given the suggested changes, the definition of duplication would need some tweaking as well.

Opinions welcome! -- Steve

keilbeck commented 8 years ago

Not sure about your definition on delins/indel. An snv is a special type of substitution that is of extent 1. Delins/indel suggests a difference in length between what was deleted and what was inserted. When the length is the same, it is a substitution or MNV right?

I need time to digest the rest of the comments. (I'm back from trip to UK.)

--K

keilbeck commented 8 years ago

Looking at literature, CNLOH is defined as UPD http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2854422/ This review talks about somatic v acquired UPD. How about a parent term as Steve suggested, homozygosity_alteration, 2 child terms CNLOH and UPD? This way we have the same sort of thing, derived by two different mechanisms.

UPD definition tweek Uniparental disomy is a germline sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent.

CNLOH definition A somatic heterzygosity_alteration where a deleted region is replaced by the homologous region of the sister chromatid, thus maintaining the original copy number (or dosage) while losing prior heterozygosity.

homozygosity_alteration definition a sequence alteration in which there is a loss of heterozygosity of a given region.

I would give the paper above as a reference.

I know this is only a part of the initial request, but how dod you all feel about this as a resolution?

keilbeck commented 8 years ago

I am thinking that amplification with LOS is both a homozygosity_alteration and a copy_number_gain.

Again I am a little confused by the arbitrariness of 5 copies.

--K

trutane commented 8 years ago

@keilbeck: Your definitions look good, but I would recommend calling the parent term "heterozygosity_alteration" since it's the heterozygous state that's the normal state of affairs being perturbed by CNLOH and UPD.

Regarding: amplification with LOH = heterozygosity_alteration + copy_number_gain -- ok with me.

The 5 copy aspect for amplification comes from COSMIC (see link above), and I don't know why they settled on that number but it does seem arbitrary. It would be best to NOT tie the definition to a specific number. Amplification simply means more than the normal number of copies without specifying the magnitude.

What do you think of my proposal above to have feature_amplification as parent of both duplication and copy_number_gain?

keilbeck commented 8 years ago

agreed - lets call it hetero...

For the amplication, I will add a comment about the cosmic being 5.

regarding feature amplification, the strategy has been to separate out the sequence alteration from the effect of the variation. The sequence alteration is what is different between two sequences, one of which may be the reference, where you determine what has changed - SNV, deletion etc. The terms that fall under sequence_variant are the effects of an alteration. They rely on a sequence annotation or functional data to infer what the alteration does. There is extra information used in the inference that can change the interpretation. An SNV will be an SNV regardless of the feature set used to annotate with but it may cause a missense, UTR_variant or intron variant depending on which transcript you are looking at. Does that make the distinction more clear?

trutane commented 8 years ago

Thanks for the clarification about feature amplification. I think the issue here is that one can describe amplifications independently of the features they affect. In the cancer sequencing world, it's common to observe, "this region of the genome has been amplified relative to the reference genome sequence," without regard to which genes or other features have been impacted.

To capture this, perhaps we need an "amplification" term that is a sequence alteration?

ahwagner commented 8 years ago

@keilbeck, I'm having trouble discerning the difference between copy_number_increase and feature_amplification. When would I use one and not the other?

keilbeck commented 8 years ago

Hi Alex I think you bring up a good point. I think the initial intent was for the copy number increase to be less specific about the kind of feature that is increased – A region is amplfied rather than a transcript for example. We will work to rearrange these terms and tidy the definitions. If you need to use a term right now, I would use the amplification term as that is in common usage by annotation tools such as VEP. —K

Karen Eilbeck Associate Professor Biomedical Informatics University of Utah

From: "Alex Handler Wagner, PhD" notifications@github.com<mailto:notifications@github.com> Reply-To: The-Sequence-Ontology/SO-Ontologies reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, August 26, 2016 at 2:01 PM To: The-Sequence-Ontology/SO-Ontologies SO-Ontologies@noreply.github.com<mailto:SO-Ontologies@noreply.github.com> Cc: Karen Eilbeck keilbeck@genetics.utah.edu<mailto:keilbeck@genetics.utah.edu>, Mention mention@noreply.github.com<mailto:mention@noreply.github.com> Subject: Re: [The-Sequence-Ontology/SO-Ontologies] terms related small variant and copy number variation (#353)

@keilbeckhttps://github.com/keilbeck, I'm having trouble discerning the difference between copy_number_increase and feature_amplification. When would I use one and not the other?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/The-Sequence-Ontology/SO-Ontologies/issues/353#issuecomment-242836618, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKXgF9i9XeIIGFZorelFgywFqJu-8nKUks5qj0YYgaJpZM4IPPCK.