Goatofmountain / SCOUT

A single-cell genotyper
MIT License
1 stars 1 forks source link

Cannot run pipeline on single sample #1

Open dfermin opened 2 years ago

dfermin commented 2 years ago

Hello.

I'm trying to run SCOUT on a single sample I have from a 10X genomics run. I have run the BAM file through the GATK recalibration steps and try to use SCOUT to call genotypes.

This is the command I use:

python ../SCOUT/bin/SCOUT_WholeGenome.py -N samp154 -r $FASTA -i ../samp154.recal.bam -o ../out -P 20 -c chr2 -S 1000

The program runs for a bit and then I get this error:

Fri Mar  4 15:18:01 2022: The first Annotation finished!
Fri Mar  4 15:18:01 2022: The second Annotation finished!
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 30, in WorkPip
    PipEST.MakeCandidateDf(chrom, Start, End)
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/SCOUT.git/source/Calculate/Candidate.py", line 296, in MakeCandidateDf
    self.GetCutoffSimple()
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/SCOUT.git/source/Calculate/Candidate.py", line 316, in GetCutoffSimple
    estimator.fit(pd.DataFrame(MixDf['RawRate']))
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 917, in fit
    X = self._validate_data(X, ensure_min_samples=2, estimator=self)
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/base.py", line 566, in _validate_data
    X = check_array(X, **check_params)
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/utils/validation.py", line 805, in check_array
    raise ValueError(
ValueError: Found array with 1 sample(s) (shape=(1, 1)) while a minimum of 2 is required by AgglomerativeClustering.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 169, in <module>
    main(sys.argv[1:])
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 147, in main
    res = ResultPool[k].get()
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: Found array with 1 sample(s) (shape=(1, 1)) while a minimum of 2 is required by AgglomerativeClustering.

This happens with both SCOUT scripts. Any suggestion on how to fix it?

Thanks

Goatofmountain commented 2 years ago

Hello, I've tried to run SCOUT on single-cell WGS or bulk WGS data, and the pipline is OK. Would you mind to provide one of your bam file for me to reproduce this error ?

Thanks, Kailing Tu