AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

AttributeError: module 'pomegranate' has no attribute 'NormalDistribution' #52

Closed Tina04021997 closed 4 months ago

Tina04021997 commented 4 months ago

Dear AmpliconSuite developer:

Thank you for developing such a nice tool. I encountered the following issue while running the CNVkit segment:

  File "/tscc/nfs/home/tiy002/miniforge3/envs/ampsuite/lib/python3.10/site-packages/cnvlib/segmentation/hmm.py", line 101, in hmm_get_model
    pom.NormalDistribution(-2.0, stdev, frozen=False),
AttributeError: module 'pomegranate' has no attribute 'NormalDistribution'
CNVKit encountered a non-zero exit status. Exiting...

My command:

/tscc/nfs/home/tiy002/AmpliconSuite-pipeline/PrepareAA.py -s KANR -t 64 --ref GRCh38 -o ${outdir}/KANR --cnvkit_dir /tscc/nfs/home/tiy002/miniforge3/envs/ampsuite/bin/cnvkit.py --bam ${bamdir}/KANR_mkdp.bam --aa_python_interpreter /tscc/nfs/home/tiy002/miniforge3/envs/ampsuite/bin/python3 --samtools_path /tscc/nfs/home/tiy002/miniforge3/envs/ampsuite/bin/samtools --cngain 4.5 --cnsize_min 50000 --downsample -1 --run_AA --run_AC --cnvkit_segmentation hmm-tumor

Several pieces of information that might be helpful:

  1. I installed the latest AmpliconSuite using Mamba
  2. bam header using samtools view -H for this sample:
    @PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -T 0 -t 64 -R @RG\tID:KANR\tSM:KANR\tPL:ILLUMINA ${ref}/Reference_Genomes/GRCh38.d1.vd1/GRCh38.d1.vd1.fa ${fastq}/KANR_CKDN230035638-1A_22GF7LLT3_L8_1.fq.gz ${fastq}/KANR_CKDN230035638-1A_22GF7LLT3_L8_2.fq.gz
    @PG     ID:MarkDuplicates       VN:2.18.27-SNAPSHOT     CL:MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=4000 INPUT=[KANR_raw.bam] OUTPUT=KANR_mkdp.bam METRICS_FILE=KANR_markDuplicates_Matrix.txt ASSUME_SORT_ORDER=coordinate VALIDATION_STRINGENCY=STRICT CREATE_INDEX=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false     PN:MarkDuplicates
    @PG     ID:samtools     PN:samtools     PP: MarkDuplicates       VN:1.19.2       CL:samtools view -H ${bamdir}/KANR_mkdp.bam
  3. Some key environment versions used: Python 3.10.13
    cnvkit 0.9.10 amplicon-suite 1.2.1
  4. I also tried starting with fastq files but the same CNVkit error occurs (and also tried on another sample)

Thank you so much.

jluebeck commented 4 months ago

Hi Tina,

Thanks for this question. I believe this is an issue with CNVkit when specifying hmm-tumor as the segmentation method. Specifically, it is not enforcing a high enough minimum version of pomegranate. If you were to install CNVkit separately, and run it using the instructions they provide, and set hmm-tumor as the segmentation method, I imagine you will run into this issue again. The CNVkit documentation does state that this option is experimental.

One option is to revert to the cbs segmentation method, as the filters in AmpliconSuite-pipeline are already optimized for calls segmented by cbs.

Another option is to attempt to upgrade your version of pomegranate to something newer that has the 'NormalDistribution' attribute ( 0.7.7 or later?)

I suppose mamba is perhaps also at fault as it is less stringent on dependencies than conda, and this kind of behavior may result.

Jens

Tina04021997 commented 4 months ago

Dear Jens,

Thank you for your reply! It has been solved by your suggestion -- switching to cbs. Thank you for your help.

Best, Tina