AlexandrovLab / SigProfilerSingleSample

SigProfilerSingleSample allows attributing a known set of mutational signatures to an individual sample. The tool identifies the activity of each signature in the sample and assigns the probability for each signature to cause a specific mutation type in the sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
23 stars 2 forks source link

Vcf input doesn't work with other signatures than default ones #14

Closed Marozi2 closed 3 years ago

Marozi2 commented 3 years ago

Hi,

I'm currently using SigProfilerSingleSample to refit mutational catalog from my samples against signatures from 2013. I'm now trying to do the same but with a VCF instead of a mutational catalog. I'm able to do it when I refit against the signatures of 2020, so the ones used by default by the tool.

>>> spss.single_sample("/Users/romain/Doctorat/data/formatted/mutational_signature/sigprofiler-vcf_test", "/Users/romain/Doctorat/data/results/mutational_signature", ref="GRCH38", exome=False)
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 28.03 seconds.
Matrices generated for 1 samples with 0 errors. Total of 1369083 SNVs, 55110 DINUCs, and 0 INDELs were successfully analyzed.
##########################################################
Exacting Profile for Sample 1

CONGRATULATIONS! THE SIGPROFILER SINGLE SAMPLE ANALYSIS ENDED SUCCESSFULLY

My problem is, the tool stops after the creation of decomposition profile.csv if I try to refit the VCF against the signatures from 2013. The decomposition profile.csv is empty as well as the .err file.

>>> import pandas as pd
>>> cols = list(pd.read_csv("/Users/romain/Doctorat/data/brut/mutational_signature/signatures.txt", delimiter='\t', nrows=1))
>>> signatures = pd.read_csv("/Users/romain/Doctorat/data/brut/mutational_signature/signatures.txt", delimiter='\t', usecols=[i for i in cols if i not in ['Substitution Type', 'Trinucleotide']], index_col=0)
>>> spss.single_sample("/Users/romain/Doctorat/data/formatted/mutational_signature/sigprofiler-vcf_test/2013_sig", "/Users/romain/Doctorat/data/results/mutational_signature/sigprofiler-vcf_test/2013_sig", sig_database=signatures, ref="GRCH38", exome=False)
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 27.19 seconds.
Matrices generated for 1 samples with 0 errors. Total of 1369083 SNVs, 55110 DINUCs, and 0 INDELs were successfully analyzed.
##########################################################
Exacting Profile for Sample 1
>>> # The prompt appears almost immediately after previous line

Is it a bug or is it not possible to use SigProfilerSingleSample with a VCF to refit against other signatures than the ones used by default?

radh1ka commented 3 years ago

I seem to be having the same issue did you manage to fix this ?

Marozi2 commented 3 years ago

Hello,

No sorry. I gave up trying to fix this. When needed, I do the refitting on 2013 signatures by creating a mutational catalog myself instead of using a vcf. Please, let me know if you find a solution.

Sonia-Bedi commented 3 years ago

Hi Marozi2, I just started working with SigprofilerExtractor, I was wondering how to create a mutational catalogue of my samples? Could you help me out please. Thanks in advance.

Marozi2 commented 3 years ago

Hi Sonia-Bedi, I'm not sure there is a public tool to convert a vcf in mutational catalog. I did it myself. I start with the output vcf from my SNPs caller. I first isolate the SNPs and get the trinucleotide context for each SNPs. Then I make a file with 3 columns: Trinucleotide REF ALT. From that file, I remove indels then convert purines bases in pyrimidines because the tool take care only of the pyrimidines (well explained here). After that, I compute the number of each SNPs in each trinucleotide context and then it's a matter of formatting the file to have a proper mutational catalog for the tool.

Sonia-Bedi commented 3 years ago

Thank you @Marozi2 for your reply. This sounds like a big task. I am still in search of a way to transform my vcf file either into the acceptable format which SigProfilerExtractor accepts or transform it into mutational catalogue. If I dont find any solution, will just follow what you did.

marcos-diazg commented 3 years ago

Hi @Marozi2 and @Sonia-Bedi,

You can generate a mutational catalog using your preferred reference genome by using SigProfilerMatrixGenerator. You can find the details in the corresponding README, as well as in this useful Wiki page.

On the other hand, regarding the refitting analysis with custom signature files that you mentioned @Marozi2, I would like to have a closer look into it. Could you please share the signatures file that you are using? Also, is this bug happening using both vcfs and a pandas dataframe for the input mutational catalog matrix?

Thanks a lot for your interest in our tool and sorry for the late reply.

Best, Marcos

Marozi2 commented 3 years ago

Hello,

Sorry for the late response, I was away last two weeks.

Here is the file (signatures_forSigProfiler.txt) I use for refitting on 2013 signatures. I made it from this file (signatures.txt) I took from here ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl.

The bug happens when I give a vcf as input but it does not with a mutational catalog.

signatures_forSigProfiler.txt signatures.txt

marcos-diazg commented 3 years ago

Hello @Marozi2,

Please use the option check_rules=False. That should solve your issue for now. Please let me know if that's not the case or if you have any other questions.

We are currently working on a major upgrade of the tool, where we will address this issue for sure. Thanks for letting us know.

By the way, you should download all the different versions of COSMIC signatures always from https://cancer.sanger.ac.uk/signatures/downloads/. Your signatures.txt matrix can lead to some issues since the rows have been reordered in the newest SigProfiler versions. Please check carefully your results in this regard.

Hope that helps and thanks again for your interest!

Marozi2 commented 3 years ago

Hello @marcos-diazg

Indeed it works with the option check_rules=False. Could you explain what does this option implies?

Yes I noticed that signatures.txt was not correctly formatted, that's why I created signatures_ForSigProfiler.txt.

Thank you for your help!

marcos-diazg commented 3 years ago

The check_rules option controls the application of the biological-based rules in the signature assignment process (as described in Extended Data Fig. 8b from Alexandrov et al. 2020 Nature). These rules are based on the COSMIC v3 reference mutational signatures, described in the same study and used as default by SigProfilerSingleSample. If you use a custom set of reference signatures, these rules cannot be used. As I mentioned, we are working on a major upgrade of the tool that will take this into account.

Happy to help and please reopen the issue if you have any other problems. Thanks!