Roth-Lab / pyclone-vi

Fast method for inferring cancer clonal population structure from SNV data.
GNU General Public License v3.0
50 stars 11 forks source link

mutations with no data in some samples #26

Open ZWael opened 1 year ago

ZWael commented 1 year ago

Hello @aroth85, I'am using PyClone-VI to infer the clonal structure between samples from the same subject My aim is to have a description of mutation gain/loss

as stated in the README file PyClone-VI removes mutations without entries for all samples this is the case in my data and also in the example provided from tracerx "/examples/tracerx.tsv"

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R1 202 0 2 3 2 0.21
CRUK0001:11:47843641:G R3 183 10 2 4 1 0.11

the recommended solution is to "set ref/alt counts to 0 for the corresponding sample." so I have added this line

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R2 0 0 2 0 0 0.11 CRUK0001

but as in the original file this mutation was removed in PyClone resulting table

My second attempt was to add a major_cn equal to that in normal cell

mutation_id sample_id ref_counts alt_counts normal_cn major_cn minor_cn tumour_content
CRUK0001:11:47843641:G R2 0 0 2 2 0 0.11 CRUK0001

in this case the mutation was retained with 0 in cellular prevalence in the R2 sample

What should I consider for the major_cn ? the normal_cn , copy number of the overlapping gene segment even if there is no mutated allele ?

aroth85 commented 1 year ago

The major_cn has to be greater than zero in all samples or a mutation is filtered out, since there is possible way to have a mutation at a loci which is absent. Long term this needs to be altered, but for now your second solution correct.

Ideally you would put the actual CN and allele counts observed in the sample, even if it is not reported as mutated by the variant caller.

ZWael commented 1 year ago

Thank you @aroth85 for your feedback.

So ideally I can add ref_counts = counts reported in that location alt_counts = 0 as there is no altered allele and for the cn i used the cp at the gene level

Did i get it right ?

reipatho commented 1 year ago

Hello,

I'm in the similar situation, and followed the second advice. But all of CCF results were over 0.99, which should be 0 theoretically. What should I do??

I attached my example.

test.input.txt test.output.txt

Best regards,

reipatho commented 1 year ago

I'm afraid I should input ref/alt_count of "tumor+normal", not only "tumor" ??

I tried to input former (ref_count≠0, alt_count=0), the result was CCF=0.

Or my poor understanding? Would you give me some advice?

reipatho commented 1 year ago

Now I imput tumor ref/alt count, and calculate CCF succesfully. Following is my protocol.

  1. Merge each vcf file with Bcftools merge
  2. Generate Interval list with GATK VcfToIntervalList
  3. Call variant from each bam file with GATK HaplotypeCaller

Following is a part of my data, which called 0 alt_count.

Hugo_Symbol NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode t_depth t_ref_count t_alt_count ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-1 288 284 4 ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-2 245 245 0 ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-3 259 259 0 ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-5 320 253 67 ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-6 292 235 57 ARHGAP28 GRCh38 18 6851108 6851108 + Silent SNP C C T Sample_FJ01-7 347 275 72