getzlab / SignatureAnalyzer

Updated SignatureAnalyzer-GPU with mutational spectra & RNA expression compatibility.
MIT License
71 stars 21 forks source link

signature_weighted_maf.tsv missing ~10% of variants from input maf #26

Closed MUppal closed 3 years ago

MUppal commented 3 years ago

Hi,

I'm facing an issue with SignatureAnalyzer where nearly 10% of the variants in the input maf are not present in the final signature_weighted_maf.tsv file. For example, my maf (input.maf) contains 3,270,095 variants. I run the following command on my maf: _signatureanalyzer -n 10 --cosmic cosmic3 --hg_build hg19.2bit --objective poisson --max_iter 30000 --prior_on_H L1 --prior_onW L1 input.maf The output, signature_weighted_maf.tsv, only contains 3,023,380 of the original variants. This corresponds to about an 8% loss of variants, and is reproduced on other mafs that I have run with the tool.

There doesn't appear to me to be a specific pattern to which variants are lost. The missing variants appear relatively uniformly distributed across samples, chromosomes, and reference/variant alleles.

Is this expected behavior? Is there a way to force SignatureAnalyzer to provide signature probability estimates for every variant in the input maf?

I look forward to hearing back from you.

shankara-a commented 3 years ago

Hi,

When you read the nmf_output.h5 file, what is your input spectra to the algorithm? You can read this with X = pd.read_hdf("nmf_output.h5","X"). Does this have all the variants you expect?

Do you have a couple examples of variants that are included and those that are not?

jcha40 commented 3 years ago

The reason that some SNPs might drop out is that the context cannot be determined from a maf for adjacent SNPs within a single sample (are they DNPs or are they in trans?).