abyzovlab / CNVpytor

a python extension of CNVnator -- a tool for CNV analysis from depth-of-coverage by mapped reads
MIT License
188 stars 27 forks source link

Failed to open reference... Protocol not supported #128

Open Deleetdk opened 2 years ago

Deleetdk commented 2 years ago

Working with my own genome in CRAM format, I installed using pip as local user. Installation and downloading of files proceeded without issues. However, run time produced errors:

user@computer:/disk/genomes/nebula/emil$ pip install cnvpytor
Defaulting to user installation because normal site-packages is not writeable
Collecting cnvpytor
  Downloading CNVpytor-1.2.1.tar.gz (1.1 MB)
     |████████████████████████████████| 1.1 MB 2.8 MB/s            
  Preparing metadata (setup.py) ... done
Collecting gnureadline
  Downloading gnureadline-8.1.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (628 kB)
     |████████████████████████████████| 628 kB 53.5 MB/s            
Requirement already satisfied: requests>=2.0 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (2.21.0)
Requirement already satisfied: pysam>=0.15 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (0.16.0.1)
Requirement already satisfied: numpy>=1.16 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (1.19.5)
Requirement already satisfied: scipy>=1.1 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (1.4.1)
Requirement already satisfied: matplotlib>=2.2 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (3.2.1)
Requirement already satisfied: h5py>=2.9 in /home/emil/.local/lib/python3.6/site-packages (from cnvpytor) (3.1.0)
Collecting xlsxwriter>=1.3
  Downloading XlsxWriter-3.0.3-py3-none-any.whl (149 kB)
     |████████████████████████████████| 149 kB 43.8 MB/s            
Requirement already satisfied: cached-property in /home/emil/.local/lib/python3.6/site-packages (from h5py>=2.9->cnvpytor) (1.5.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /home/emil/.local/lib/python3.6/site-packages (from matplotlib>=2.2->cnvpytor) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /home/emil/.local/lib/python3.6/site-packages (from matplotlib>=2.2->cnvpytor) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/emil/.local/lib/python3.6/site-packages (from matplotlib>=2.2->cnvpytor) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in /home/emil/.local/lib/python3.6/site-packages (from matplotlib>=2.2->cnvpytor) (2.8.1)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/emil/.local/lib/python3.6/site-packages (from requests>=2.0->cnvpytor) (3.0.4)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /home/emil/.local/lib/python3.6/site-packages (from requests>=2.0->cnvpytor) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/emil/.local/lib/python3.6/site-packages (from requests>=2.0->cnvpytor) (2019.9.11)
Requirement already satisfied: idna<2.9,>=2.5 in /home/emil/.local/lib/python3.6/site-packages (from requests>=2.0->cnvpytor) (2.8)
Requirement already satisfied: six in /home/emil/.local/lib/python3.6/site-packages (from cycler>=0.10->matplotlib>=2.2->cnvpytor) (1.15.0)
Building wheels for collected packages: cnvpytor
  Building wheel for cnvpytor (setup.py) ... done
  Created wheel for cnvpytor: filename=CNVpytor-1.2.1-py3-none-any.whl size=1117058 sha256=92e8ff5a0594e8d927b280e1c0b43c09c7572ba6dad62d3c326b1c6936b16147
  Stored in directory: /home/emil/.cache/pip/wheels/80/1b/90/1ecbda76f9a2d71cb22216c141cb4160b2e0c464a0a6871c0c
Successfully built cnvpytor
Installing collected packages: xlsxwriter, gnureadline, cnvpytor
Successfully installed cnvpytor-1.2.1 gnureadline-8.1.2 xlsxwriter-3.0.3
user@computer:/disk/genomes/nebula/emil$ cnvpytor -download
2022-08-18 07:11:35,662 - cnvpytor.genome - INFO - Updating reference genome resource files...
2022-08-18 07:11:35,662 - cnvpytor.genome - INFO - Detecting missing GC resource file for reference genome 'hg19'
2022-08-18 07:11:37,021 - cnvpytor.genome - INFO - Downloading GC resource file: gc_hg19.pytor
2022-08-18 07:11:37,885 - cnvpytor.genome - INFO - File downlaoded.
2022-08-18 07:11:37,885 - cnvpytor.genome - INFO - Detecting missing MASK resource file for reference genome 'hg19'
2022-08-18 07:11:39,059 - cnvpytor.genome - INFO - Downloading MASK resource file: mask_hg19.pytor
2022-08-18 07:11:39,704 - cnvpytor.genome - INFO - File downlaoded.
2022-08-18 07:11:39,704 - cnvpytor.genome - INFO - Detecting missing GC resource file for reference genome 'hg38'
2022-08-18 07:11:41,053 - cnvpytor.genome - INFO - Downloading GC resource file: gc_hg38.pytor
2022-08-18 07:11:41,984 - cnvpytor.genome - INFO - File downlaoded.
2022-08-18 07:11:41,985 - cnvpytor.genome - INFO - Detecting missing MASK resource file for reference genome 'hg38'
2022-08-18 07:11:43,466 - cnvpytor.genome - INFO - Downloading MASK resource file: mask_hg38.pytor
2022-08-18 07:11:44,135 - cnvpytor.genome - INFO - File downlaoded.
2022-08-18 07:11:44,135 - cnvpytor.genome - INFO - Detecting missing GC resource file for reference genome 'chm13'
2022-08-18 07:11:45,523 - cnvpytor.genome - INFO - Downloading GC resource file: gc_chm13.pytor
2022-08-18 07:11:46,478 - cnvpytor.genome - INFO - File downlaoded.
2022-08-18 07:11:46,478 - cnvpytor.genome - INFO - Done.
user@computer:/disk/genomes/nebula/emil$ cnvpytor -root file.pytor -rd 
genome_Emil_Kirkegaard_Full_20140303114842.txt  NG1IL0F60J.vcf.gz.tbi
NG1IL0F60J.cram                                 notebook.Rmd
NG1IL0F60J.cram.crai                            rsids
NG1IL0F60J.vcf.gz                               wgs_23andme_overlap.vcf.gz
user@computer:/disk/genomes/nebula/emil$ cnvpytor -root file.pytor -rd NG1IL0F60J.cram
2022-08-18 07:13:25,481 - cnvpytor.bam - INFO - File: NG1IL0F60J.cram successfully open
2022-08-18 07:13:25,481 - cnvpytor.bam - INFO - Detected reference genome: hg38
2022-08-18 07:13:25,483 - cnvpytor.pool - INFO - Parallel processing using 8 cores
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr1 with length 248956422
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr2 with length 242193529
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr3 with length 198295559
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr4 with length 190214555
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr5 with length 181538259
2022-08-18 07:13:25,494 - cnvpytor.root - INFO - Reading data for chromosome chr6 with length 170805979
2022-08-18 07:13:25,495 - cnvpytor.root - INFO - Reading data for chromosome chr7 with length 159345973
2022-08-18 07:13:25,495 - cnvpytor.root - INFO - Reading data for chromosome chr8 with length 145138636
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0 10001..39269

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,527 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/76635a41ea913a405ded820447d067b0": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 2
[E::cram_decode_slice] Unable to fetch reference #2 10179..49575

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,529 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/f98db672eb0993dcfdabafe2a882905c": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 1
[E::cram_decode_slice] Unable to fetch reference #1 10417..44091

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,540 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/c67955b5f7815a9a1edfaa15893d3616": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 7
[E::cram_decode_slice] Unable to fetch reference #7 60001..136325

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,543 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/cc044cc2256a1141212660fb07b6171e": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 6
[E::cram_decode_slice] Unable to fetch reference #6 10248..49629

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,545 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/3210fecf1eb92d5489da4346b3fddc6e": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 3
[E::cram_decode_slice] Unable to fetch reference #3 10002..24650

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,546 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/a811b3dc9fe66af729dc0dddf7fa4f13": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 4
[E::cram_decode_slice] Unable to fetch reference #4 10005..45128

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,547 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/5691468a67c7e7a7b5f2a3a683792c29": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 5
[E::cram_decode_slice] Unable to fetch reference #5 60029..138449

[E::cram_next_slice] Failure to decode slice
2022-08-18 07:13:25,548 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
^CTraceback (most recent call last):
  File "/home/emil/.local/bin/cnvpytor", line 8, in <module>
    sys.exit(main())
  File "/home/emil/.local/lib/python3.6/site-packages/cnvpytor/__main__.py", line 270, in main
    app.rd(args.rd, chroms=args.chrom, reference_filename=args.reference_filename)
  File "/home/emil/.local/lib/python3.6/site-packages/cnvpytor/root.py", line 308, in rd
    self._read_bam(bf, chroms, reference_filename=reference_filename, overwrite=overwrite)
  File "/home/emil/.local/lib/python3.6/site-packages/cnvpytor/root.py", line 90, in _read_bam
    res = parmap(read_chromosome, chr_len, cores=self.max_cores)
  File "/home/emil/.local/lib/python3.6/site-packages/cnvpytor/pool.py", line 50, in parmap
    sent = [q_in.put((i, x)) for i, x in enumerate(x_arg)]
  File "/home/emil/.local/lib/python3.6/site-packages/cnvpytor/pool.py", line 50, in <listcomp>
    sent = [q_in.put((i, x)) for i, x in enumerate(x_arg)]
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 82, in put
    if not self._sem.acquire(block, timeout):
KeyboardInterrupt

I see nothing reported here or on Google. The URLs work fine, e.g. https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd opens OK in the browser though of course is not readable by humans (binary).

Any ideas? I will try the Github version.

My CRAM file is here: https://filedn.eu/lCyoUMpONNB7afAi4dJTUyX/data/genomics/personal%20genomes/emil/

arpanda commented 2 years ago

Hi, As you are using cram formatted file, could you please try with reference genome and let us know the problem persist or not. cnvpytor -root file.pytor -rd NG1IL0F60J.cram -T <path for reference fasta file>

Thanks Arijit

Deleetdk commented 2 years ago

I've let it run for 6+ hours so far, seems it is stuck:

cnvpytor -root file.pytor -rd NG1IL0F60J.cram -T /data/genomics/reference_files/hg38.fa
2022-08-29 01:34:53,231 - cnvpytor.bam - INFO - File: NG1IL0F60J.cram successfully open
2022-08-29 01:34:53,232 - cnvpytor.bam - INFO - Detected reference genome: hg38
2022-08-29 01:34:53,236 - cnvpytor.pool - INFO - Parallel processing using 8 cores
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr2 with length 242193529
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr1 with length 248956422
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr3 with length 198295559
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr4 with length 190214555
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr5 with length 181538259
2022-08-29 01:34:53,248 - cnvpytor.root - INFO - Reading data for chromosome chr6 with length 170805979
2022-08-29 01:34:53,249 - cnvpytor.root - INFO - Reading data for chromosome chr7 with length 159345973
2022-08-29 01:34:53,249 - cnvpytor.root - INFO - Reading data for chromosome chr8 with length 145138636
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 2 pos 16776232..16814735
[E::cram_decode_slice] CRAM: 94982bbafd95a1a748fa20098fa90785
[E::cram_decode_slice] Ref : f16024284d657779afbaff7aeafdee31
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:35:13,239 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 1 pos 20944307..20980664
[E::cram_decode_slice] CRAM: d6684283cb67e862e1f3c1a612609f28
[E::cram_decode_slice] Ref : 14de7a8fa2a63952b296c2f2457ac77c
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:35:18,101 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 4 pos 47308302..49599821
[E::cram_decode_slice] CRAM: c4a1a9cc77b233653ed493122ac7d8f6
[E::cram_decode_slice] Ref : a0079071eceaa2b978aec2dbe12e9744
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:35:47,822 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 5 pos 61323012..61378416
[E::cram_decode_slice] CRAM: f6e90e6d1513b16c3f6b27b12470e858
[E::cram_decode_slice] Ref : d31242f5e8ad326d470866fa93641452
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:36:06,459 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
2022-08-29 01:37:41,411 - cnvpytor.root - INFO - Reading data for chromosome chr9 with length 138394717
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 6 pos 154558071..154597614
[E::cram_decode_slice] CRAM: 828f186956ba7548c8cb2a5cf3f58d95
[E::cram_decode_slice] Ref : 4e7e0d006cfebccf4945eba9a783830e
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:37:51,781 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
2022-08-29 01:38:35,198 - cnvpytor.root - INFO - Reading data for chromosome chr10 with length 133797422
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 8 pos 89873013..89913176
[E::cram_decode_slice] CRAM: 64c0251ef044d4a1f07ddb3c6091ff65
[E::cram_decode_slice] Ref : de90d4be921385c7764f1156bca9e3fb
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:39:05,648 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 9 pos 39224987..39261161
[E::cram_decode_slice] CRAM: 1db319bc13d26df6c359d41e6304a51a
[E::cram_decode_slice] Ref : be226a28220ba1ae4633caf36a971ffb
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:39:19,863 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'
[E::cram_decode_slice] MD5 checksum reference mismatch for ref 0 pos 248745489..248787439
[E::cram_decode_slice] CRAM: a2844d84c77fd8b4f48ae633c2d0f962
[E::cram_decode_slice] Ref : 20c6c56d70d5fd59b58aa111d0dcccd1
[E::cram_next_slice] Failure to decode slice
2022-08-29 01:39:40,503 - cnvpytor.bam - ERROR - Error while reading file 'NG1IL0F60J.cram'

You were right, the protocol errors disappeared.

suvakov commented 2 years ago

It seems that reference used to create CRAM file is not the same as hg38.fa reference you provided in command line. Please, check assembly version in CRAM header.