AlexandrovLab / SigProfilerMatrixGenerator

SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.
BSD 2-Clause "Simplified" License
100 stars 37 forks source link

Can I use ichorCNA output files as input? #129

Closed teacakedeadlift closed 1 year ago

teacakedeadlift commented 1 year ago

Hi,

I'm interested in looking at copy number signatures in some DCIS/breast cancer samples (both multi-region tissue and cfDNA). We only have 1X coverage and processed through ichorCNA so don't have BAF, just logR, copy number calls & segmentation.

Can this be used as input to SigProfilerMatrixGenerator? If so, how would I go about creating an input file that would work - what column headings would be needed and what would 'file_type = ' need to be?

Any help appreciated.

Thanks

Phil

azhark2 commented 1 year ago

Can you please provide an example of the segmentation file provided by ichorCNA? It should just be a matter of renaming some column names to match input types that are already supported (assuming the calls are allele-specific).

teacakedeadlift commented 1 year ago

Hi Azhar

The seg output file isn't allele-specific, presumably because it is designed to run on low coverage data (mine is 1X WGS)? .seg file columns are as follows:

sample  chr start   end event   copy.number bins    median
1234.ctDNA  chr1    1000001 74000000    HLAMP   5   73  0.0279954972684503
1234.ctDNA  chr1    74000001    248000000   NEUT    2   174 7.64628371670954e-05
1234.ctDNA  chr2    1000001 242000000   NEUT    2   241 0.00249153433561523
1234.ctDNA  chr3    1000001 198000000   NEUT    2   197 -0.00436992327885263

There is also a .cna.seg file (edited col names for readability):

chr start   end ctDNA.copy.number   ctDNA.event ctDNA.logR  ctDNA.subclone.status   ctDNA.Corrected_Copy_Number ctDNA.Corrected_Call    ctDNA.logR_Copy_Number
chr1    1000001 2000000 5   HLAMP   NA  0   5   HLAMP   NA
chr1    3000001 4000000 5   HLAMP   0.1142  0   5   HLAMP   14.4432202024739
chr1    4000001 5000000 5   HLAMP   0.0824  0   5   HLAMP   10.9479925637138
chr1    5000001 6000000 5   HLAMP   0.0792  0   5   HLAMP   10.6005191374886
chr1    6000001 7000000 5   HLAMP   0.0348  0   5   HLAMP   5.85802148588685

And a seg.txt

ID  chrom   start   end num.mark    seg.median.logR copy.number call    subclone.statuslogR_Copy_Number Corrected_Copy_Number   Corrected_Call
1234.ctDNA  chr1    1000001 74000000    73  0.0279954972684503  5   HLAMP   FALSE   5.14402386439506    5   HLAMP
1234.ctDNA  chr1    74000001    248000000   174 7.64628371670954e-05    2   NEUT    FALSE   2.24947605183101    2   NEUT
1234.ctDNA  chr2    1000001 242000000   241 0.00249153433561523 2   NEUT    FALSE   2.49765496079995    2   NEUT
1234.ctDNA  chr3    1000001 198000000   197 -0.00436992327885263    2   NEUT    FALSE   1.79363921247398    2   NEUT

I assume if not allele specific then these cannot be run through SigProfilerMatrixGenerator?

For some of the multiregion tissue samples I have combined bams with combined coverage of 5 - 15X. Is this still too low to put through an allele specific pipeline (ASCAT for example)? I had previously trialled titanCNA (that builds on ichorCNA) but minimum depth required was around 15X.

Thanks

azhark2 commented 1 year ago

Sorry for the delay. Yes, our classification scheme requires allele-specific CN calls. Your data is a bit tricky for CN signatures; I would refer to this paper: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-021-07686-z. You'll need to find a WGS caller that outputs BAF and logR with very low coverage, not exactly sure if this exists. One option could be (https://github.com/Wedge-lab/battenberg).

teacakedeadlift commented 1 year ago

Hi @azhark2

Thanks for the reply. I'm not too sure it exists either! Will give Battenberg a go and see what happens.

Thanks

Phil