AmpliconSuite / CoRAL

Reconstruction of focal amplifications with long reads
BSD 3-Clause "New" or "Revised" License
13 stars 3 forks source link

classifying results #19

Open eesiribloom opened 2 weeks ago

eesiribloom commented 2 weeks ago

Is there a need to classify the output of CoRAL or are all circular paths/amplicons output estimated to be ecDNA specifically? Thanks again for the tool

jluebeck commented 2 weeks ago

Hi, this is a great question. Some degree of interpretation is required still for CoRAL outputs. We have not built up a diverse enough sample set to determine how well AmpliconClassifier would work and what the modifications might be to make it compatible with CoRAL results. This will change as we analyze more samples, but we won't be ready to connect those two tools for some time.

A big risk with calling all genome cycles as ecDNA is that some might be derived from a breakage fusion bridge. If you see a high degree of foldback inversions and reconstructions that contain palindromic series of segments, and step-wise changes in CN then it is more likely to be BFB. Also, genome cycles decomposed with low CN may also not be ecDNA. High CN decompositions that are closed in a head-to-tail fashion are more likely ecDNA. We're happy to help interpret results if needed.

Thanks, Jens

eesiribloom commented 2 weeks ago

Thank you for your reply. Here is an example of a cycles.txt file from an amplicon. Note: this was generated with default parameters. When I increased --min_bp_supportto 5.0 or 10.0 I saw a significant decrease in paths output - which is expected - but I was no longer able to reconstruct any circular paths, so I went back to default.

I would assume Im looking for circular paths, paths that satisfy constraints (not entirely sure what this means though) and have high copy number and subpaths with high support (is this read support?)

Does 0+ and 0- indicate a circular path?

For example, based on previous results from AA and decoil, cycle 1 and cycle 6 are of interest and overlap regions previously estimated to be contained within ecDNA, but Im not sure if these results necessarily support that.

Interval    1   chr3    130528734   130728734
Interval    2   chr8    42836869    43036869
Interval    3   chr12   23184602    28059521
Interval    4   chr12   29101551    29301551
Interval    5   chr12   31614458    32309448
Interval    6   chr12   72410272    72665271
Interval    7   chr19   29200032    29946017
Interval    8   chr19   33146201    34601276
Interval    9   chr19   39031534    40328962
Interval    10  chr20   19475071    19900922
Interval    11  chrX    1739217 3193027
List of cycle segments
Segment 1   chr3    130528734   130628733
Segment 2   chr3    130628734   130728734
Segment 3   chr8    42836869    42936868
Segment 4   chr8    42936869    43036869
Segment 5   chr12   23184602    23313736
Segment 6   chr12   23313737    23782045
Segment 7   chr12   23782046    23797060
Segment 8   chr12   23797061    23957745
Segment 9   chr12   23957746    23997864
Segment 10  chr12   23997865    25265269
Segment 11  chr12   25265270    25360453
Segment 12  chr12   25360454    25622905
Segment 13  chr12   25622906    25622918
Segment 14  chr12   25622919    25780115
Segment 15  chr12   25780116    25803970
Segment 16  chr12   25803971    25806645
Segment 17  chr12   25806646    25917588
Segment 18  chr12   25917589    27958246
Segment 19  chr12   27958247    28059521
Segment 20  chr12   29101551    29201551
Segment 21  chr12   29201552    29301551
Segment 22  chr12   31614458    31713292
Segment 23  chr12   31713293    31716434
Segment 24  chr12   31716435    32206875
Segment 25  chr12   32206876    32309448
Segment 26  chr12   72410272    72512816
Segment 27  chr12   72512817    72563536
Segment 28  chr12   72563537    72665271
Segment 29  chr19   29200032    29300031
Segment 30  chr19   29300032    29459277
Segment 31  chr19   29459278    29465259
Segment 32  chr19   29465260    29467530
Segment 33  chr19   29467531    29785292
Segment 34  chr19   29785293    29822449
Segment 35  chr19   29822450    29843120
Segment 36  chr19   29843121    29946017
Segment 37  chr19   33146201    34500466
Segment 38  chr19   34500467    34601276
Segment 39  chr19   39031534    39132453
Segment 40  chr19   39132454    39643471
Segment 41  chr19   39643472    40228962
Segment 42  chr19   40228963    40328962
Segment 43  chr20   19475071    19575070
Segment 44  chr20   19575071    19800051
Segment 45  chr20   19800052    19900922
Segment 46  chrX    1739217 1911898
Segment 47  chrX    1911899 1937067
Segment 48  chrX    1937068 2163657
Segment 49  chrX    2163658 2186393
Segment 50  chrX    2186394 2187092
Segment 51  chrX    2187093 2206100
Segment 52  chrX    2206101 2277483
Segment 53  chrX    2277484 2282342
Segment 54  chrX    2282343 2374471
Segment 55  chrX    2374472 2376727
Segment 56  chrX    2376728 2390794
Segment 57  chrX    2390795 3048708
Segment 58  chrX    3048709 3193027
List of longest subpath constraints
Path constraint 1   2-,13+,14+  Support=17  Satisfied
Path constraint 2   40-,35+,47+ Support=1   Satisfied
Path constraint 3   7-,32-,30-  Support=9   Satisfied
Path constraint 4   30+,32+,33+ Support=12  Satisfied
Path constraint 5   49+,51+,52+ Support=2   Satisfied
Path constraint 6   54+,56+,57+ Support=1   Satisfied
Path constraint 7   15+,16+,15- Support=12  Unsatisfied
Path constraint 8   24-,23+,24+ Support=14  Satisfied
Path constraint 9   6+,7+,15+   Support=1   Unsatisfied
Path constraint 10  12+,13+,13+,14+ Support=58  Satisfied
Path constraint 11  6+,7+,8+    Support=2   Satisfied
Path constraint 12  15+,16+,17+ Support=96  Satisfied
Path constraint 13  22+,23+,24+ Support=15  Satisfied
Path constraint 14  49+,50+,51+ Support=8   Satisfied
Path constraint 15  54+,55+,56+ Support=2   Satisfied
Cycle=1;Copy_count=3.8574164389262715;Segments=0+,5+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,16+,17+,27+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,16+,17+,27+,6+,7+,8+,9+,10+,11+,12+,13+,13+,14+,15+,16+,17+,18+,10+,11+,12+,13+,14+,15+,16+,17+,27+,6+,7+,8+,9+,10+,11+,12+,13+,14+,15+,16+,17+,18+,19+,0-;Path_constraints_satisfied=10,11,12
Cycle=2;Copy_count=3.8001404858032872;Segments=0+,39+,40+,41+,42+,0-;Path_constraints_satisfied=
Cycle=3;Copy_count=3.165662498442714;Segments=0+,20+,21+,0-;Path_constraints_satisfied=
Cycle=4;Copy_count=3.04413374549155;Segments=0+,1+,2+,0-;Path_constraints_satisfied=
Cycle=5;Copy_count=3.0309324976212757;Segments=0+,22+,23+,24+,25+,0-;Path_constraints_satisfied=13
Cycle=6;Copy_count=2.948191433018197;Segments=0+,29+,30+,32+,33+,34+,35+,47+,34+,35+,47+,48+,49+,51+,52+,54+,56+,57+,58+,0-;Path_constraints_satisfied=4,5
Cycle=7;Copy_count=2.1959669554063765;Segments=0+,26+,27+,28+,0-;Path_constraints_satisfied=
Cycle=8;Copy_count=2.123415783042603;Segments=0+,37+,38+,0-;Path_constraints_satisfied=
Cycle=9;Copy_count=1.970303031811106;Segments=0+,4-,44+,57-,56-,54-,53-,52-,51-,50-,49-,48-,47-,35-,34-,47-,35-,40+,41+,37-,0-;Path_constraints_satisfied=2,6,14
Cycle=10;Copy_count=1.8285167303750316;Segments=0+,43+,44+,57-,56-,54-,52-,51-,49-,48-,47-,46-,0-;Path_constraints_satisfied=
Cycle=11;Copy_count=1.60212559198529;Segments=0+,4-,44+,45+,0-;Path_constraints_satisfied=
Cycle=12;Copy_count=1.1282921215430193;Segments=0+,20+,41+,42+,0-;Path_constraints_satisfied=
Cycle=13;Copy_count=0.9005756417726687;Segments=0+,5+,6+,7+,8+,9+,10+,11+,24-,23+,24+,11-,10-,9-,51-,49-,48-,47-,35-,34-,33-,32-,30-,29-,0-;Path_constraints_satisfied=8
Cycle=14;Copy_count=0.8945466985190293;Segments=0+,29+,30+,32+,33+,34+,35+,36+,0-;Path_constraints_satisfied=
Cycle=15;Copy_count=0.8572647184183353;Segments=0+,20+,41+,37-,0-;Path_constraints_satisfied=
Cycle=16;Copy_count=0.7190071710443533;Segments=0+,43+,44+,57-,56-,55-,54-,53-,52-,51-,49-,48-,47-,46-,0-;Path_constraints_satisfied=
Cycle=17;Copy_count=0.6402548518493347;Segments=0+,2-,13+,13+,14+,15+,16+,17+,18+,10+,11+,24-,23-,24+,11-,10-,9-,8-,7-,6-,27-,26-,0-;Path_constraints_satisfied=1
Cycle=18;Copy_count=0.5216167669073717;Segments=0+,25-,24-,23-,24+,25+,0-;Path_constraints_satisfied=
Cycle=19;Copy_count=0.21732915087771373;Segments=0+,5+,6+,7+,8+,9+,10+,11+,12+,13+,13+,14+,15+,16+,17+,18+,19+,0-;Path_constraints_satisfied=
Cycle=20;Copy_count=0.17478185137363725;Segments=0+,5+,6+,7+,8+,9+,10+,11+,12+,13+,13+,14+,15+,16+,17+,18+,10+,11+,24-,23+,24+,11-,10-,9-,8-,7-,32-,30-,56-,55-,54-,53-,52-,51-,50-,49-,48-,47-,35-,34-,47-,35-,34-,33-,32-,31-,30-,29-,0-;Path_constraints_satisfied=3,15
jluebeck commented 1 week ago

Hi,

Thanks for providing this output file.

The 0+ / 0- notation at the ends of entries in the cycles section indicates what we termed "source" nodes. These connect to the linear genome adjacent to the listed amplified segments. Thus they are not true genome cycles and not directly indicative of ecDNA.

The path constraints listed above the cycles do not report cyclic paths, and do not use the 0+ / 0- notation so that can be a bit confusing.

Because all entries for the decomposed paths in the Cycles section of the file are non-cyclic, I think it is fair to say that CoRAL did not identify ecDNA in this amplicon. It may exist but was not detected here.

Thanks and let me know if there are other questions, Jens