AlexandrovLab / SigProfilerAssignment

Assignment of known mutational signatures to individual samples and individual somatic mutations
BSD 2-Clause "Simplified" License
46 stars 10 forks source link

Extract mutations assigned to each SBS signature #148

Open itigupta2429 opened 5 days ago

itigupta2429 commented 5 days ago

Hi Team, I had a query. I would like to extract which mutations are assigned to which SBS signature for each sample. Is there a way I can do that? I am using sigProfiler Assignment. Thanks in advance

mdbarnesUCSD commented 5 days ago

Hi @itigupta2429,

If you are looking to assign COSMIC signatures to a set of samples then you will want to use cosmic_fit. Please reach out if you have any additional questions.

itigupta2429 commented 5 days ago

Hi @mdbarnesUCSD,

I am using the following command:

Analyze.cosmic_fit(
    samples="./test/BRCA_vcf", 
    output="test_vcf", 
    input_type="vcf", 
    genome_build="GRCh37", 
    context_type="96", 
    export_probabilities_per_mutation="TRUE", 
    export_probabilities="TRUE"
)

After running this, I expected to find a file that contains mutation-level assignments, indicating which SBS signature is linked to each mutation from the input VCF. However, after thoroughly reviewing all the output files, I can't locate any file that provides this information.

Could you please clarify where this file should be generated or if there’s an additional step required to obtain mutation-wise signature assignments?

Thanks for your help!

mdbarnesUCSD commented 5 days ago

Thanks for clarifying. It seems that you would like to export the probabilities per mutation file. I suspect the issue is that you are currently passing the string "TRUE". Please try again using the boolean True as shown below.

Analyze.cosmic_fit(
    samples="./test/BRCA_vcf", 
    output="test_vcf", 
    input_type="vcf", 
    genome_build="GRCh37", 
    context_type="96", 
    export_probabilities_per_mutation=True, 
    export_probabilities=True
)

Please let us know if this resolves the issue. Thanks!

itigupta2429 commented 5 days ago

Thanks for your quick response! With the earlier code as well I was getting the probabilities per mutation file (inside: BRCA_vcf/output/vcf_files/SNV) The folder contains chromosome wise files; and If I see one file it contains information like: PD4120a 10 71718 N:GA[T>A]CA 1 PD4120a 10 115370 N:AC[T>A]AC -1 PD4120a 10 117751 N:CT[C>A]AG -1 PD4120a 10 212461 U:TT[C>T]AG 1 PD4120a 10 247953 U:AC[C>G]TG 1 PD4120a 10 311033 N:AC[C>T]GG -1 PD4120a 10 369240 T:GG[C>G]TC 1 PD4120a 10 387315 T:CT[C>G]AG 1 PD4120a 10 442142 T:TT[C>G]AA 1 PD4120a 10 471214 T:AG[C>T]GG 1 PD4120a 10 484448 U:CT[C>G]AA -1 PD4120a 10 520650 T:CT[C>G]TT 1 PD4120a 10 646938 T:CT[C>T]AT 1 PD4120a 10 657996 U:CT[C>T]CC -1 I have a couple of questions regarding this:

  1. How would I know to which SBS does this mutation (chr10 71718) assigns to?
  2. What does +1 & -1 mean here?