AmpliconSuite / AmpliconClassifier

Classify output of AmpliconArchitect to detect types of focal amplifications present
BSD 2-Clause "Simplified" License
14 stars 11 forks source link

inconsistency between the ecDNA counts reported by annotated_cycles and ecDNA_counts.tsv #20

Open newpest opened 2 months ago

newpest commented 2 months ago

There are two ecDNAs in AC's annotated_cycles file, but ecDNA_counts.tsv only gives one, and the bed file of this ecDNA is a combination of the two ecDNAs in the annotated_cycles file, so which one should be used as the standard for the final result?

ecDNA_counts.tsv

ampliconarchitect amplicon1 Cyclic Positive None detected 1

bed

20 45770081 45771921 20 45778344 45788294 20 52181786 53557332 20 54157523 54158002 20 54159266 54161359 20 55675464 55855043 20 62288769 62289400 17 60001725 60001897 17 60094504 60095816 17 60188227 60190908 17 60212997 62388060

cycles

Cycle=1;Copy_count=19.6402746181;Length=19150;IsCyclicPath=True;CycleClass=ecDNA-like;Segments=62+,95+,50-,48-,84+,83-,60+,47+ Cycle=21;Copy_count=3.77523588444;Length=3734287;IsCyclicPath=True;CycleClass=ecDNA-like;Segments=53+,76-,87-,56-

jluebeck commented 2 months ago

Hi, thanks for reaching out with this question.

One thing to keep in mind is the interpretation of the AA cycles file. In most cases, a combination of substructures may be reported for a single ecDNA. One of the benefits of AmpliconClassifier is that it determines if these substructures are from the same ecDNA or not. When ecDNA structure is complex, it typically cannot be resolved uniquely with short reads alone.

From the AA Readme: Except in structurally simple cases, the decompositions reported by AA in the cycles file may represent computational substructures of a larger complete ecDNA of unknown structure. Because the signatures of ecDNA are still present in these substructures, for predictions on which entries are ecDNA-like, and for the predicted genome intervals captured on ecDNA please use AmpliconClassifier (AC) on your AA outputs to predict ecDNA status. Annotated versions of these cycles files indicating which cycles appear ecDNA-like, bed files and additional summary tables are directly produced by AC, simplifying interpretation. Most importantly however, individual AA cycles should not be interpreted as complete ecDNA reconstructions without first ruling out that these are substructures of the amplified regions.

In the case you showed, it would appear that the larger cycle (21) is more representative of the true ecDNA structure, and cycle 1 is some sort of substructure.

Thanks, Jens

newpest commented 2 months ago

How to obtain the structure of the ecDNA obtained by AC through the classification_bed_files

jluebeck commented 2 months ago

The classification bed files are sorted by coordinate and are not helpful for structure determination. The exact structures of complex ecDNA are often only resolvable through the use of long read technologies, or application of techniques like Circle-Seq or CRISPR-Catch.