AmpliconSuite / AmpliconClassifier

Classify output of AmpliconArchitect to detect types of focal amplifications present
BSD 2-Clause "Simplified" License
14 stars 11 forks source link

which method used for classifing the amplicon segments? #8

Closed panxiaoguang closed 2 years ago

panxiaoguang commented 2 years ago

I want to know that in the new version of AmpliconClassified, which method do you use for classifing the AA segment? Is it the same as the paper "Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers" ?

I noticed that some AA circles have copy count than 2 and are cyclic path, but were annotated as Invalid, other circles have copy count lower than 1, but were annotated as ecDNA-like. why?

I also want to know how to classify the circle as BFB-like?

thank you

List of cycle segments
Segment 1       chr11   68918739        69586074
Segment 2       chr11   69016056        69586074
Segment 3       chr11   69019178        69586074
Segment 4       chr11   69019178        69917215
Segment 5       chr11   69020515        69021526
Segment 6       chr11   69021300        69897899
Segment 7       chr11   69044907        69045421
Segment 8       chr11   69095904        69096349
Segment 9       chr11   69166747        69167214
Segment 10      chr11   69173211        69173686
Segment 11      chr11   69182945        69185657
Segment 12      chr11   69260428        69260986
Segment 13      chr11   69292417        69292879
Segment 14      chr11   69320250        69320831
Segment 15      chr11   69402744        69403236
Segment 16      chr11   69466263        69466786
Segment 17      chr11   69547329        69547862
Segment 18      chr11   69582727        69583176
Segment 19      chr11   69584126        69584556
Segment 20      chr11   69586635        69671999
Segment 21      chr11   69586635        69886318
Segment 22      chr11   69586635        70955658
Segment 23      chr11   69588588        69589125
Segment 24      chr11   69591418        69592124
Segment 25      chr11   69628208        69629041
Segment 26      chr11   70247337        70247807
Segment 27      chr11   70337624        70338135
Segment 28      chr11   70868780        70955658
Segment 29      chr11   70878300        70955658
Segment 30      chr11   70900035        70955658
Segment 31      chr11   70932764        70933197
Cycle=1;Copy_count=2.61518949544;Length=2591301;IsCyclicPath=False;CycleClass=Rearranged;Segments=0+,22-,2-,3+,20+,0+
Cycle=2;Copy_count=2.20046574451;Length=467;IsCyclicPath=True;CycleClass=Invalid;Segments=9+
Cycle=3;Copy_count=1.97095766512;Length=2036358;IsCyclicPath=False;CycleClass=Linear;Segments=0+,22-,1-,0+
Cycle=4;Copy_count=1.71901607684;Length=462;IsCyclicPath=True;CycleClass=Invalid;Segments=13+
Cycle=5;Copy_count=1.71393536211;Length=514;IsCyclicPath=True;CycleClass=Invalid;Segments=7+
...
Cycle=13;Copy_count=1.14691388505;Length=1767738;IsCyclicPath=True;CycleClass=ecDNA-like;Segments=4+,21-,2-
jluebeck commented 2 years ago

Hi,

We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.

The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.

Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.

BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.

I hope this is helpful in answering your questions, Jens

panxiaoguang commented 2 years ago

Hi,

We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.

The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.

Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.

BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.

I hope this is helpful in answering your questions, Jens

Thank you very much, I got it!

panxiaoguang commented 2 years ago

Hi,

We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.

The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.

Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.

BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.

I hope this is helpful in answering your questions, Jens

Sorry I have an another question. How can I get ecDNA's copy number? by adding all the segments copy counts?

jluebeck commented 2 years ago

Hi, AC does not currently provide the individual ecDNAs copy number explicitly. Segments of the genome inside an ecDNA may be duplicated which makes solving the precise copy number of a single ecDNA species difficult without solving the exact structure. It is possible however to get the copy numbers of specific genes on the ecDNA by looking at the AA graph file. You could also consider reporting the median genomic copy number of ecDNA genomic segments as a proxy. The CAMPER.py script from PrepareAA may also help you with determining this ecDNA copy number and I encourage you to perhaps try it on one of your examples having one ecDNA in the amplicon.

panxiaoguang commented 2 years ago

Hi, AC does not currently provide the individual ecDNAs copy number explicitly. Segments of the genome inside an ecDNA may be duplicated which makes solving the precise copy number of a single ecDNA species difficult without solving the exact structure. It is possible however to get the copy numbers of specific genes on the ecDNA by looking at the AA graph file. You could also consider reporting the median genomic copy number of ecDNA genomic segments as a proxy. The CAMPER.py script from PrepareAA may also help you with determining this ecDNA copy number and I encourage you to perhaps try it on one of your examples having one ecDNA in the amplicon.

Thank you for your quickly reply! I learned a lot from your paper and scripts.