Saskia-Oosterbroek / decona

fastq to polished sequenses: pipeline suitable for mixed samples and long (Nanopore) reads
MIT License
41 stars 12 forks source link

Issue with medaka #52

Open Eefje-Kuijpers opened 2 months ago

Eefje-Kuijpers commented 2 months ago

Hi, I was able to run decona and get results from Racon, however if I add the option -M it gives me the following error:

Filtering data... Data filtered with NanoFilt Data not demultiplexed total raw sequences = 881671 total filtered sequences = 59832 Fastq reads are being transformed to fasta Transforming fastq to fasta Complete Clustering reads... Clustering 2324-022-05-RL09/... Clustering 2324-022-06-RL10/... Clustering 2324-022-07-RL11/... Clustering 2324-022-08-RL17/... Clustering complete. Aligning and making draft assembly of 8829-1.fa... [M::mm_idx_gen::0.0051.88] collected minimizers [M::mm_idx_gen::0.0072.43] sorted minimizers [M::main::0.0072.41] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0082.37] mid_occ = 2 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0082.34] distinct minimizers: 217 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.576 [M::worker_pipeline::0.9693.70] mapped 8829 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -k15 -t 4 ref_8829-1.fasta 8829-1.fa [M::main] Real time: 0.970 sec; CPU: 3.590 sec; Peak RSS: 0.024 GB [racon::Polisher::initialize] loaded target sequences 0.000138 s [racon::Polisher::initialize] loaded sequences 0.061289 s [racon::Polisher::initialize] loaded overlaps 0.047977 s [racon::Polisher::initialize] aligning overlaps [====================] 0.111484 s [racon::Polisher::initialize] transformed data into windows 0.005526 s [racon::Polisher::polish] generated consensus 37.548909 s [racon::Polisher::] total = 37.783351 s Done Aligning and making draft assembly of 7190-0.fa... [M::mm_idx_gen::0.0022.88] collected minimizers [M::mm_idx_gen::0.0053.11] sorted minimizers [M::main::0.0053.07] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0052.99] mid_occ = 2 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0062.92] distinct minimizers: 222 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.419 [M::worker_pipeline::0.5933.61] mapped 7190 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -k15 -t 4 ref_7190-0.fasta 7190-0.fa [M::main] Real time: 0.593 sec; CPU: 2.142 sec; Peak RSS: 0.020 GB [racon::Polisher::initialize] loaded target sequences 0.000107 s [racon::Polisher::initialize] loaded sequences 0.052012 s [racon::Polisher::initialize] loaded overlaps 0.044091 s [racon::Polisher::initialize] aligning overlaps [====================] 0.100748 s [racon::Polisher::initialize] transformed data into windows 0.003257 s [racon::Polisher::polish] generated consensus 30.247409 s [racon::Polisher::] total = 30.455563 s Done Aligning and making draft assembly of 14336-0.fa... [M::mm_idx_gen::0.0051.65] collected minimizers [M::mm_idx_gen::0.0082.24] sorted minimizers [M::main::0.0082.23] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0082.21] mid_occ = 2 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0082.20] distinct minimizers: 217 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.599 [M::worker_pipeline::1.5453.73] mapped 14336 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -k15 -t 4 ref_14336-0.fasta 14336-0.fa [M::main] Real time: 1.545 sec; CPU: 5.755 sec; Peak RSS: 0.036 GB [racon::Polisher::initialize] loaded target sequences 0.000389 s [racon::Polisher::initialize] loaded sequences 0.101578 s [racon::Polisher::initialize] loaded overlaps 0.078210 s [racon::Polisher::initialize] aligning overlaps [====================] 0.193321 s [racon::Polisher::initialize] transformed data into windows 0.006471 s [racon::Polisher::polish] generated consensus 63.191474 s [racon::Polisher::] total = 63.587103 s Done Aligning and making draft assembly of 12455-0.fa... [M::mm_idx_gen::0.0290.30] collected minimizers [M::mm_idx_gen::0.0320.54] sorted minimizers [M::main::0.0320.54] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0320.55] mid_occ = 2 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0320.55] distinct minimizers: 202 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.619 [M::worker_pipeline::0.9993.55] mapped 12455 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -ax map-ont -k15 -t 4 ref_12455-0.fasta 12455-0.fa [M::main] Real time: 1.000 sec; CPU: 3.545 sec; Peak RSS: 0.029 GB [racon::Polisher::initialize] loaded target sequences 0.000078 s [racon::Polisher::initialize] loaded sequences 0.087196 s [racon::Polisher::initialize] loaded overlaps 0.067349 s [racon::Polisher::initialize] aligning overlaps [====================] 0.156392 s [racon::Polisher::initialize] transformed data into windows 0.007427 s [racon::Polisher::polish] generated consensus 53.024429 s [racon::Polisher::] total = 53.353805 s Done polishing 8829-1.fa Racon sequence with Medaka... 2024-05-07 14:19:44.913053: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Checking program versions This is medaka 1.1.2 Program Version Required Pass bcftools 1.10.2 1.9 True bgzip 1.20 1.9 True minimap2 2.17 2.11 True samtools 1.18 1.9 True tabix 1.20 1.9 True 2024-05-07 14:19:47.713774: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-07 14:19:50.107697: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Aligning basecalls to draft Removing previous index file /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/polished_8829-1.fasta.mmi Removing previous index file /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/polished_8829-1.fasta.fai Constructing minimap index. [M::mm_idx_gen::0.0051.78] collected minimizers [M::mm_idx_gen::0.0082.17] sorted minimizers [M::main::0.0111.86] loaded/built the index for 1 target sequence(s) [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0111.85] distinct minimizers: 210 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.710 [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -I 16G -x map-ont --MD -d /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/polished_8829-1.fasta.mmi /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/polished_8829-1.fasta [M::main] Real time: 0.011 sec; CPU: 0.020 sec; Peak RSS: 0.003 GB [M::main::0.0041.61] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.0051.59] mid_occ = 2 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.0051.57] distinct minimizers: 210 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.710 [M::worker_pipeline::1.0533.60] mapped 8829 sequences [M::main] Version: 2.17-r941 [M::main] CMD: minimap2 -x map-ont --MD -t 4 -a -A 2 -B 4 -O 4,24 -E 2,1 /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/polished_8829-1.fasta.mmi /data/basecalling/MAXEEFJE/decona_sample/data/2324-022-05-RL09/multi-seq/8829-1.fa [M::main] Real time: 1.055 sec; CPU: 3.797 sec; Peak RSS: 0.024 GB [bam_sort_core] merging from 0 files and 4 in-memory blocks... Running medaka consensus 2024-05-07 14:19:54.835589: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. [14:19:56 - Predict] Processing region(s): 4b01ed4b-1620-47a8-aeb0-6649e615278b:0-1199 [14:19:56 - Predict] Using model: /home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/data/r941_min_high_g360_model.hdf5. [14:19:56 - Predict] Setting tensorflow threads to 4. [14:19:56 - Predict] Processing 1 long region(s) with batching. [14:19:56 - ModelStore] filepath /home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/data/r941_min_high_g360_model.hdf5 [14:19:57 - DLoader] Initializing data loader [14:19:57 - PWorker] Running inference for 0.0M draft bases. [14:19:57 - Sampler] Initializing sampler for consensus of region 4b01ed4b-1620-47a8-aeb0-6649e615278b:0-1199. [14:19:58 - Feature] Processed 4b01ed4b-1620-47a8-aeb0-6649e615278b:0.0-1198.0 (median depth 8140.0) [14:19:58 - Sampler] Took 1.24s to make features. [14:19:58 - Sampler] Region 4b01ed4b-1620-47a8-aeb0-6649e615278b:0.0-1198.0 (5569 positions) is smaller than inference chunk length 10000, quarantining. [14:19:58 - PWorker] All done, 1 remainder regions. [14:19:58 - Predict] Processing 1 short region(s). [14:19:58 - ModelStore] filepath /home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/data/r941_min_high_g360_model.hdf5 [14:19:59 - DLoader] Initializing data loader [14:19:59 - PWorker] Running inference for 0.0M draft bases. [14:19:59 - Sampler] Initializing sampler for consensus of region 4b01ed4b-1620-47a8-aeb0-6649e615278b:0-1199. [14:20:00 - Feature] Processed 4b01ed4b-1620-47a8-aeb0-6649e615278b:0.0-1198.0 (median depth 8140.0) [14:20:00 - Sampler] Took 1.19s to make features. [14:20:02 - PWorker] All done, 0 remainder regions. [14:20:02 - Predict] Finished processing all regions. Using medaka stitch to create consensus. 2024-05-07 14:20:02.997614: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. [14:20:04 - DataIndex] Loaded 1/1 (100.00%) sample files. [14:20:04 - Stitch] Stitching regions: ['4b01ed4b-1620-47a8-aeb0-6649e615278b:0-'] [14:20:04 - DataIndex] Loaded 1/1 (100.00%) sample files. [14:20:04 - Stitch] Processing 4b01ed4b-1620-47a8-aeb0-6649e615278b:0-. [14:20:04 - Stitch] Used heuristic 0 times for 4b01ed4b-1620-47a8-aeb0-6649e615278b:0-. concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker r = call_item.fn(*call_item.args, *call_item.kwargs) File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk return [fn(args) for args in chunk] File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/process.py", line 198, in return [fn(*args) for args in chunk] File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/stitch.py", line 133, in _stitcher return fill_gaps(contigs, draft) File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/stitch.py", line 117, in fill_gaps draft_seq = draft.fetch(ref_name) File "pysam/libcfaidx.pyx", line 301, in pysam.libcfaidx.FastaFile.fetch KeyError: "sequence 'b'4b01ed4b-1620-47a8-aeb0-6649e615278b'' not present" """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/bin/medaka", line 11, in sys.exit(main()) File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/medaka.py", line 669, in main args.func(args) File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/site-packages/medaka/stitch.py", line 152, in stitch for contigs, gap_tree in executor.map(worker, rgrps): File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists for element in iterable: File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/home/ekuijpers2/data/basecalling/MAXEEFJE/miniconda3/envs/decona/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception KeyError: "sequence 'b'4b01ed4b-1620-47a8-aeb0-6649e615278b'' not present" Failed to stitch consensus chunks.

Any help would be appreciated.