Closed panxiaoguang closed 2 years ago
Hi,
We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.
The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.
Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.
BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.
I hope this is helpful in answering your questions, Jens
Hi,
We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.
The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.
Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.
BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.
I hope this is helpful in answering your questions, Jens
Thank you very much, I got it!
Hi,
We have made significant updates to the method for classifying ecDNA in comparison to the method used in the Kim et al. Nature Genetics 2020 study. The code which performs the "legacy" classifications is also available to reproduce that study is available in the "legacy_natgen_2020" folder, but we recommend users instead use the latest version of AmpliconClassifier.
The "Copy_count=" entry reflects the amount of copy number flow decomposed into that path or cycle by AA during its optimization of balanced flow in the breakpoint graph, and is not the copy number of the element in the genome.
Cycle 2 for instance in this output, has length 467 bases, and we deem it too small to be considered ecDNA. Same for cycles 4 and 5.
BFBs are classified by examining the fraction of breakpoints in the amplicon which are "foldback" (inversion, with distance between pairs of less than 25kbp. AC also examines if a significant fraction of the paths and cycles (weighted by decomposition copy count) are BFB substrings. If both the fraction of foldbacks and the fraction of BFB-substrings in the paths and cycles are high, a BFB classification is assigned. BFB-like paths and genomic regions are marked and then AC checks for the presence of ecDNA outside of the BFB.
I hope this is helpful in answering your questions, Jens
Sorry I have an another question. How can I get ecDNA's copy number? by adding all the segments copy counts?
Hi, AC does not currently provide the individual ecDNAs copy number explicitly. Segments of the genome inside an ecDNA may be duplicated which makes solving the precise copy number of a single ecDNA species difficult without solving the exact structure. It is possible however to get the copy numbers of specific genes on the ecDNA by looking at the AA graph file. You could also consider reporting the median genomic copy number of ecDNA genomic segments as a proxy. The CAMPER.py
script from PrepareAA may also help you with determining this ecDNA copy number and I encourage you to perhaps try it on one of your examples having one ecDNA in the amplicon.
Hi, AC does not currently provide the individual ecDNAs copy number explicitly. Segments of the genome inside an ecDNA may be duplicated which makes solving the precise copy number of a single ecDNA species difficult without solving the exact structure. It is possible however to get the copy numbers of specific genes on the ecDNA by looking at the AA graph file. You could also consider reporting the median genomic copy number of ecDNA genomic segments as a proxy. The
CAMPER.py
script from PrepareAA may also help you with determining this ecDNA copy number and I encourage you to perhaps try it on one of your examples having one ecDNA in the amplicon.
Thank you for your quickly reply! I learned a lot from your paper and scripts.
I want to know that in the new version of AmpliconClassified, which method do you use for classifing the AA segment? Is it the same as the paper "Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers" ?
I noticed that some AA circles have copy count than 2 and are cyclic path, but were annotated as Invalid, other circles have copy count lower than 1, but were annotated as ecDNA-like. why?
I also want to know how to classify the circle as BFB-like?
thank you