AmpliconSuite / AmpliconClassifier

Classify output of AmpliconArchitect to detect types of focal amplifications present
BSD 2-Clause "Simplified" License
14 stars 11 forks source link

interpreting the results of AmpliconClassifier #18

Open eesiribloom opened 7 months ago

eesiribloom commented 7 months ago

I am struggling to completely understand the differences of the outputs in AA and AC and how to translate this to presenting the results of the analysis: If I have an individual sample which appears to have ecDNA how should I understand the results of having say:

Several amplicons from AA, some of which might contain "ecDNA" according to AC and multiple ecDNA-like cycles.

I understand AC merges overlapping cycles/paths to estimate what might be an ecDNA. Firstly, what is an "amplicon" in comparison to the, anywhere from 1-5 or so, ecDNA contained within that output from AC. Within this there are several"ecDNA-like" cycles/paths...which I am a bit confused about. Are these ecDNA reported by AC then considered distinct? If I have a sample which has e.g. two ecDNA-containing amplicons, one with 5 ecDNAs and one with 1 ecDNA, should I interpret this single sample to contain 6 distinct ecDNA species? essentially, what is the level of resolution where I might say "this sample has this particular ecDNA made up of this particular genomic region/intervals" - would it be the amplicon, the ecDNA, or the cycle?

jluebeck commented 7 months ago

Hi,

Thanks for reaching out with this great question. I think this is a very good point to clarify.

First, there is a bit of confusing terminology:

The distinction here is that an AA "amplicon" can contain multiple focal amplifications. When we report an ID for each feature we give it in the format of [sample]_[AA amplicon number]_[feature type]_[index of that feature in the AA amplicon]. For instance, sample_amplicon1_ecDNA_5, is the 5th ecDNA reported in the sample's AA amplicon1.

The multiple features reported in a single AA amplicon (e.g. ecDNA_1 and ecDNA_2) are indeed considered distinct by AC, as they are unconnected by SVs and/or are separated by >500kbp. For classes of different types (e.g. ecDNA and BFB), these can overlap genomically, and be considered distinct as the pattern of SVs and CNs that creates the classification are separate.

In your example, one AA amplicon with 5 ecDNAs, and another with 1 ecDNA, then yes, in total we should count 6 distinct ecDNAs predicted by AC. We consider events distinct at the "feature" level (ecDNA_1, 2, etc.) and count across all amplicons in a sample. We don't distinguish by counting individual cycles as multiple cycles can be detected from the same feature, particularly if the structure is complex and AA only extracts a collection of substructures of the larger ecDNA.

Please let me know if you have any additional questions here, I'd be happy to answer additional queries on this.

Thanks, Jens