Closed MUppal closed 3 years ago
Hi,
When you read the nmf_output.h5
file, what is your input spectra to the algorithm? You can read this with X = pd.read_hdf("nmf_output.h5","X")
. Does this have all the variants you expect?
Do you have a couple examples of variants that are included and those that are not?
The reason that some SNPs might drop out is that the context cannot be determined from a maf for adjacent SNPs within a single sample (are they DNPs or are they in trans?).
Hi,
I'm facing an issue with SignatureAnalyzer where nearly 10% of the variants in the input maf are not present in the final signature_weighted_maf.tsv file. For example, my maf (input.maf) contains 3,270,095 variants. I run the following command on my maf: _signatureanalyzer -n 10 --cosmic cosmic3 --hg_build hg19.2bit --objective poisson --max_iter 30000 --prior_on_H L1 --prior_onW L1 input.maf The output, signature_weighted_maf.tsv, only contains 3,023,380 of the original variants. This corresponds to about an 8% loss of variants, and is reproduced on other mafs that I have run with the tool.
There doesn't appear to me to be a specific pattern to which variants are lost. The missing variants appear relatively uniformly distributed across samples, chromosomes, and reference/variant alleles.
Is this expected behavior? Is there a way to force SignatureAnalyzer to provide signature probability estimates for every variant in the input maf?
I look forward to hearing back from you.