BoevaLab / FREEC

Control-FREEC: Copy number and genotype annotation in whole genome and whole exome sequencing data
152 stars 49 forks source link

Subclonal frequency #84

Open kmavrommatis opened 3 years ago

kmavrommatis commented 3 years ago

Hi, I am trying to understand the information provided for the subclones in the ratio.txt file. As I understand the Subclone_CN and Sublcone_Population correspond to the copy number and fraction of the cells that have the respective aberration.

In several cases I get entries that indicate that the Subclone_CN is 0 and subclone_Population high (e.g near 0.98). If I interpret these values correctly this means that the majority of the (tumor) cells in that region will have a total loss of copies. However, both the Ratio column as well as the BAF and CopyNumber indicate regions with normal coverage.

Here is a list of segments after merging the CNV and ratio.txt files

chrom start end width strand Ratio MedianRatio CopyNumber BAF EstimatedBAF Genotype UncertaintyOfGT Gene Subclone_CN Subclone_Population CNV_CN CNV_typel CNV_geno Wilcoxon KolmogorovSmirnov hit
chr3 122384188 122384413 226 + 2.040220 1.055850 2 0.943503 1.0 AA 5.19694 3:122384187-122384413 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 122536836 122537046 211 + 1.020940 1.055850 2 0.953704 1.0 AA 5.19694 3:122536835-122537046 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 122540442 122540878 437 + 1.020940 1.055850 2 0.953704 1.0 AA 5.19694 3:122536835-122537046 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 122626903 122627086 184 + 0.958112 1.055850 2 0.939560 1.0 AA 5.19694 3:122626902-122627086 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 122758789 122758972 184 + 1.062830 1.055850 2 0.949640 1.0 AA 5.19694 3:122758788-122758972 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 122948443 122948662 220 + 1.698140 1.055850 2 0.971429 1.0 AA 5.19694 3:122948442-122948662 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 123145491 123145674 184 + 1.048870 1.055850 2 0.940789 1.0 AA 5.19694 3:123145490-123145674 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 123154979 123155162 184 + 0.930187 1.055850 2 0.960784 1.0 AA 5.19694 3:123154978-123155162 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE
chr3 123332497 123332677 181 + 1.663230 1.055850 2 0.962963 1.0 AA 5.19694 3:123332496-123332677 0 0.914908 2 neutral AA 2.374070e-05 3.357706e-05 TRUE

Can you please clarify how to interpret these cases? I understand that the Subclone population is a fraction of the tumor component i.e. a fraction of the number of cells * purity, correct? Furthemore, In the more general case where the Subclonal_population is more than 50% of the (tumor)cells, why don't these get presented as the "main" clone ?

info file:

Program_Version                                      v11.5
Sample_Name                      TP_0.9_vcfrate.mpileup.gz
Control_Used                                         False
CGcontent_Used                                       False
Mappability_Used                                     False
Looking_For_Subclones                                 True
Breakpoint_Threshold                                   0.6
Window                                                   0
Number_Of_Reads|Pairs_In_Sample                   88725194
Number_Of_Reads|Pairs_In_Control                         0
Output_Ploidy                                            2
Sample_Purity                                     0.862882
Good_Polynomial_Fit                                   True

Thanks in advance for your help

valeu commented 3 years ago

Dear K, The subclone option is still experimental.

In your example,

Subclone_CN | Subclone_Population 
 2 | 0.943503

You have 0 in Subclone_CN because you have this prediction true for the major clone. And these CN (==2) is estimated to be present in 94% of cancer cells. You are right that this value is corrected with the purity.

On the side note, I would recommend using v11.6. It also looks like you have a nice script to annotate genes with the information from the 2 tables. You may think to share it with the community. Please let me know.

kmavrommatis commented 3 years ago

Thanks for the quick response, I am a bit confused about your comment:

You have 0 in Subclone_CN because you have this prediction true for the major clone. And these CN (==2) is estimated to be present in 94% of cancer cells. You are right that this value is corrected with the purity.

if I understand correctly you mean that since subclone_CN ==0 , I should use the main clone prediction (CopyNumber==2), and the subclone_population==0.94 as the frequency of the main clone ? So does this mean that when subclone_CN ==0 I have to ignore it and use the main CopyNumber instead? ie. is subclone_CN==0 a special case ? or is there some other flag/combination of values that I need to consider?

And it that is true what happens if indeed there is a major clone with 2cn and a minor subclone with 0 copies (i.e. loss)?

I will try v11.6 asap.

The method for annotating the copy number information is based on a simple R script that reads both files and combines them, can you suggest the best way to share?

Thanks in advance for your help

valeu commented 3 years ago

I guess you are right, that I should not write 0 there. Here the median of 1.055850 suggests that the major clone is 2 copies with maybe 5-6% of 3 copies (or it can be just noise). I think I need a better way to output subclone information.

kmavrommatis commented 3 years ago

Thank you,

Based on the above, and given how currently the subclones are expressed, could you help explain the following cases/confirm conclusions:

CopyNumber Subclone_CN Subclone_Population conclusion
2 0 0 No subclones. Only 2 copies from major clone
3 0 0 No subclones. Only 3 copies from major clone
2 0 0.95 major clone has 2 copies and 95% abundance (corrected for ploidy). There is 5% which can be due to noise.
3 0 0.67 ??.
4 1 0.79 major clone 4 copies, but for this segment there is a subclone with 1 copy which is 79% abundant ?? If so wouldn't it make sense that he major clone would be 1 copy and subclone 4 copies?

Is just this information enough, or do we need to take into account BAF when we try to interpret the results, and if so how would you suggest to do so? Our intention is to tell what is the abundance of a copy number aberration in the tumor population.

Thanks in advance for your help

valeu commented 3 years ago

Could you also tell me the median ratio for these cases, please? What was your minimal subclonal proportion in the config file?

kmavrommatis commented 3 years ago

Hi, here is the full list of information for cases like the ones I summarized above.

seqnames start end width strand Ratio MedianRatio CopyNumber BAF estimatedBAF Genotype UncertaintyOfGT Gene Subclone_CN Subclone_Population predCN predType predGeno WilcoxonRankSumTestPvalue KolmogorovSmirnovPvalue hit
chr1 162799891 162800011 121 + 1.13963 1.84125 4 0.973333 1 AAAA 100 1:162799890-162800011 1 0.773965 4 gain AAAA 1.437066e-31 0 TRUE
chr1 1815785 1815968 184 + 0.923205 1.4887 3 0.524272 0.666667 AAB 3.86279 1:1815784-1815968 0 0.677464 3 gain AAB 5.665153e-156 0 TRUE
chr9 128515485 128515671 187 + 1.20944 1.19548 2 0.898148 1 AA 54.5678 9:128515484-128515671 0 0.728237 2 neutral AA 5.017263e-05 0.0002839216 TRUE

Program_Version v11.5 Sample_Name TP_0.9_vcfrate.mpileup.gz Control_Used False CGcontent_Used False Mappability_Used False Looking_For_Subclones True Breakpoint_Threshold 0.6 Window 0 Number_Of_Reads|Pairs_In_Sample 88725194 Number_Of_Reads|Pairs_In_Control 0 Output_Ploidy 2 Sample_Purity 0.862882 Good_Polynomial_Fit True

Running version 11.6 did not change any of the outputs.

Thanks

valeu commented 3 years ago

0 and 1 rather mean whether FREEC thinks that there is a subclone or it is just noise... But I guess there is an error in the second line for the Subclone population - here the percentage is too low.

Do you have the _subclones.txt file entries for these regions?

kmavrommatis commented 3 years ago

Hi, here are the relevant segments from the _subclones.txt file

Possible subclones for fragment chr1:162411939-164849579
Major clone is suggest to have 4 copies
         Copy number in Subclone (different possibilities)       Subclonal population
        1       77.3965%
        0       58.0474%

Possible subclones for fragment chr1:1704316-10957789
Major clone is suggest to have 3 copies
         Copy number in Subclone (different possibilities)       Subclonal population
        0       67.7464%

Possible subclones for fragment chr9:128504715-128690076
Major clone is suggest to have 2 copies
         Copy number in Subclone (different possibilities)       Subclonal population
        0       72.8237%

Thanks

valeu commented 3 years ago

Thank you Kostas. It looks like there is an issue in the output. I will have to check it. Hope to be back to you soon.