dohlee / prism

🔍 Methylation Pattern-based, Reference-free Inference of Subclonal Makeup. (Lee et al., Bioinformatics. 2019)
https://subclone-prism.readthedocs.io
6 stars 1 forks source link

Running the deconvolution step gives the following error "ValueError: Expected n_samples >= n_components" #1

Open MohamedRefaat92 opened 4 years ago

MohamedRefaat92 commented 4 years ago

Hi,

I am trying to use Prism on WGBS data. After extracting and preprocessing the data, I get the following error at the deconvolution step:

prism deconvolute -i preprocess_out/D33_18_LB.corrected.met -o D33_18_LB.prism.result

Traceback (most recent call last):
  File "/usr/local/bin/prism", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/dist-packages/prism/cli.py", line 230, in main
    args.func(args)
  File "/usr/local/lib/python3.5/dist-packages/prism/cli.py", line 178, in deconvolute
    verbose=args.verbose,
  File "/usr/local/lib/python3.5/dist-packages/prism/deconvolute.py", line 355, in run
    bbmm.fit(merged_depths, merged_counts, common_headers)
  File "/usr/local/lib/python3.5/dist-packages/prism/mixture.py", line 102, in fit
    self.alphas_, self.betas_ = self._gmm_initialize(n, k)
  File "/usr/local/lib/python3.5/dist-packages/prism/mixture.py", line 75, in _gmm_initialize
    model.fit(r)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/mixture/base.py", line 191, in fit
    self.fit_predict(X, y)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/mixture/base.py", line 217, in fit_predict
    X = _check_X(X, self.n_components, ensure_min_samples=2)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/mixture/base.py", line 56, in _check_X
    % (n_components, X.shape[0]))
ValueError: Expected n_samples >= n_components but got n_components = 4, n_samples = 3

The content of the preprocess_out/D33_18_LB.corrected.met file can be found below

>chr1:228617825;chr1:228617849;chr1:228617869;chr1:228617873
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
0000
0000
1111
1111
>chr1:228622259;chr1:228622283;chr1:228622303;chr1:228622307
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
0101
0101
1111
>chr3:198102087;chr3:198102089;chr3:198102106;chr3:198102137;chr3:198102139
10100
10100
00000
00000
00000
00000
00000
00000
10100
11111
11111
10100
11111
11111
11111
11111
11111
11111
11111
11111
11111
11111
11111
>chr16:34588056;chr16:34588069;chr16:34588105;chr16:34588118
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
1111
0000
0000
0000
0000
1111
>chr21:8436181;chr21:8436186;chr21:8436200;chr21:8436205;chr21:8436213;chr21:8436219;chr21:8436223;chr21:8436225;chr21:8436233;chr21:8436238;chr21:8436242;chr21:8436248;chr21:8436252;chr21:8436254;chr21:8436257;chr21:8436261
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000

I would appreciate any help regarding the reported error.

Best, Mohamed

dohlee commented 4 years ago

Hi, thank you for reporting this issue and sorry for my late reply.

It seems that there are not enough epiloci (n=3) to solve k-mixture model with k more than 3.

Please note that, typically, PRISM needs more than thousands of epiloci to give reliable cluster or subclone estimates, and your .met file does not seem to have that much epiloci. I assume that this is because epiloci-extraction algorithm of PRISM is only designed for RRBS and unfortunately, not for WGBS. So it might not have extracted sufficient amount of epiloci in 'extract' step.

I'm sorry to tell you that currently PRISM is not applicable for WGBS, but I'm planning to update the tool with WGBS support sooner or later.

Best, Dohoon Lee