AmpliconSuite / AmpliconClassifier

Classify output of AmpliconArchitect to detect types of focal amplifications present
BSD 2-Clause "Simplified" License
14 stars 11 forks source link

More explanation about the output? #3

Closed zhang919 closed 3 years ago

zhang919 commented 3 years ago

Dear, I am running AmpliconClassifier on my results recently, but I am confused about some results, such as I think AmpliconClassifier classifies and counts different amplicons, but why I always get decimals in the results. Does this represent a proportion? Could you please give me more information? Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

The AmpliconClassifier method is still in development and is currently unpublished. We intend to improve the documentation in the future, which should resolve issues like this. To address your question, can I ask which output file & column from that file you are referring to specifically? Would you be able to provide an example/output of the issue?

Thanks and best regards, Jens

zhang919 commented 3 years ago

Hi jluebeck, For example, I got some results lisk following:

head amplicons_list_amplicon_classification_profiles.tsv

sample_name amplicon_number amplicon_classification ecDNA+ BFB+ ecDNA_amplicons No amp/Invalid Linear amplification Trivial cycle Complex non-cyclic Complex cyclic foldback_read_prop BFB_bwp Distal_bwp BFB_cwp Amp_entropy Amp_decomp_entropy Amp_nseg_entropy SRR8652081 amplicon10 Cyclic Positive None detected 1 0.0 0.682205235628 0.317794764372 0.0 0.0 0 0 0 0 2.02194393115 0.63564957003 1.38629436112 SRR8652081 amplicon11 Cyclic Positive None detected 1 0.0 0.811246030065 0.188753969935 0.0 0.0 0 0 0 0 1.09861228867 -0.0 1.09861228867 SRR8652081 amplicon12 Linear amplification None detected None detected 0 0.0 1.0 0.0 0.0 0.0 0 0 0 0 -0.0 -0.0 -0.0 SRR8652081 amplicon13 No amp/Invalid None detected None detected 0 1.0 0.0 0.0 0.0 0.0 0 0 0 0 2.26133525773 0.651897345298 1.60943791243 SRR8652081 amplicon14 Cyclic Positive None detected 1 0.0 0.809028829349 0.190971170651 0.0 0.0 0 0 0 0 1.09861228867 -0.0 1.09861228867 SRR8652081 amplicon15 Linear amplification None detected None detected 0 0.0 0.755630954489 0.0 0.244369045511 0.0 0 0 0 0 1.2492094954 0.55606231484 0.69314718056 SRR8652081 amplicon16 Linear amplification None detected None detected 0 0.0827856448762 0.913504601534 0.00370975358996 0.0 0.0 0 0 0 0 2.3978952728 -0.0 2.3978952728 SRR8652081 amplicon17 Linear amplification None detected None detected 0 0.0 1.0 0.0 0.0 0.0 0 0 0 0 -0.0 -0.0 -0.0 SRR8652081 amplicon18 Linear amplification None detected None detected 0 0.227086212289 0.544496791719 0.0 0.228416995991 0.0 0 0 0 0 2.61434290117 1.00490498874 1.60943791243

I can speculate like this 1) the words "prop" in foldback_read_prop means proportion? 2) the BFB_bwp and Distal_bwp is also? 3) the ecDNA_amplicons means ecDNA counts if the ecDNA+ is positive? 4) the symbol "/" in “No amp/Invalid” means or,but why the column has decima? How is it calculated?

Thanks for your attention. Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

Sorry for the delay in responding. I have recently simplified the output to remove the verbose, raw decomposition score columns considered by AmpliconClassifier towards the score. That change is available now. There is now also a table in the README describing the relevant output columns in more detail.

The 'No amp/Invalid' class indicates that the given AA amplicon is not a valid focal amplification (CN too low, or too small, etc.)

Specifically, these quantities you asked about though indicate the following 1) yes, it refers to proportion 2) BFB_bwp aand Distal_bwp refer to proportions of AA cycles which conform to abstract notions of BFB-like, or non-BFB, rearranged signatures. 3) This indicates the number of distinct ecDNA present if the sample is ecDNA+. Sometimes two non-overlaping ecDNA can end up in the same AA amplicon. 4) No amp and/or Invalid.

Please let me know if there are more questions. Best regards, Jens

zhang919 commented 3 years ago

Hi jluebeck, Thanks for you kindly help, in fact recently I am trying to classifier my AA output. I also noticed the BEB may coused misleading signal as you have descrpited in the paper publised on Natue Genetics. So I want to exclude the amplicons including BFB, is this right? If so, further, I want to cite the functions in AmpliconClassifier, can i do this? If possible, what open source agreement does it work under? I have read the code in AmpliconClassifier, due to its good variable naming rules, I can roughly understand how it works even in the absence of comments. Thanks for your nice work, again. According to my understanding, can I recognize BFB only by correctly referencing the following functions:

""" In the early days when I started this work, due to a lack of understanding of graph files, I used convert_cycles_file.py in your python script called CycleViz to combine the cycle file and graph file with the vis file you will see below. """ fb_prop, maxCN = compute_f_from_AA_graph(graph_file, False) get_data_for_bfb(self.circle_dict, self.seg_dict) print(fb_prop) all_circle_dict=get_all_circle_for_file(self.vis_file) circle_list, circle_CN, segmentD = get_data_for_bfb(all_circle_dict, self.seg_dict) fb_bwp, nfb_bwp, bfb_cwp, bfbHasEC, non_bfb_cycle_inds, \ bfb_cycle_inds = cycles_file_bfb_props(circle_list, segmentD, circle_CN) print (fb_bwp) bfbClass = classifyBFB(fb_prop, fb_bwp, nfb_bwp, bfb_cwp, maxCN) print(bfbClass)

Best. Zhang

jluebeck commented 3 years ago

Hi Zhang,

The ecDNA+ and BFB+ columns of the output are evaluated in a dependent fashion. Thus, if something is both BFB+ and ecDNA+, AmpliconClassifier has still identified an ecDNA after substracting the paths/cycles which conform to BFB, now removing the ambiguity issue initially described in the NG paper. Does that make sense?

AmpliconClassifier can be cited by citing the Nature Genetics paper listed in the README. AmpliconClassifier is released under the University of California software license, which is stated in the LICENSE file of this repository.

Best regards, Jens