bcgsc / NanoSim

Nanopore sequence read simulator
Other
217 stars 51 forks source link

Error occur during read_analysis.py command running #121

Closed aman21392 closed 3 years ago

aman21392 commented 3 years ago

I used following command and it gives error please can resolve this issue as soon as possible: /home/aclab/apps/NanoSim/src/read_analysis.py transcriptome -i /Drive7/20200316_1633_MN33429_FAM96501_e0563569/fastq_pass/control_T.fastq -rt /Drive4/Homo_cdna.fa -rg /Drive1/human_index/Homo_sapiens.GRCh38.dna.toplevel.fa -a minimap2 --no_intron_retention -o ./transcript/control -t 70

File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 713, in main() File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 660, in main align_transcriptome(in_fasta, prefix, aligner, num_threads, t_alnm, ref_t, g_alnm, ref_g) File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 131, in align_transcriptome get_primary_sam.primary_and_unaligned(g_alnm, prefix + "_genome") File "/home/aclab/apps/NanoSim/src/get_primary_sam.py", line 89, in primary_and_unaligned in_sam_file = pysam.AlignmentFile(sam_alnm_file, 'r') File "pysam/libcalignmentfile.pyx", line 737, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 986, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False

Thanks in advance

cheny19 commented 3 years ago

Hi @aman21392 ,

I just tried your commands and produced no error. Could you show me the log info before this error? It looks like your alignment is not done properly due to certain reasons.

Cheers, Chen

aman21392 commented 3 years ago

Thanks for your reply. Here I send you the whole log info with error:

running the code with following parameters:

infile /Drive7/2nd_nanopore_experiment_data/control_25march/20200316_1633_MN33429_FAM96501_e0563569/fastq_pass/control_T.fastq ref_g /Drive1/human_index/Homo_sapiens.GRCh38.dna.toplevel.fa ref_t /Drive4/nanopore_2nd_experiment/ncRNA_cdna/Homo_cdna.fa annot aligner minimap2 g_alnm t_alnm prefix ./transcript/ num_threads 70 model_fit True intron_retention False 2021-05-30 12:50:56: Read pre-process and unaligned reads analysis 2021-05-30 12:51:05: Alignment with minimap2 to reference transcriptome [M::mm_idx_gen::7.8161.56] collected minimizers [M::mm_idx_gen::8.9663.28] sorted minimizers [M::main::9.0073.27] loaded/built the index for 250156 target sequence(s) [M::mm_mapopt_update::9.5183.15] mid_occ = 142 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 250156 [M::mm_idx_stat::9.7453.10] distinct minimizers: 19770498 (41.38% are singletons); average occurrences: 3.693; average spacing: 5.405; total length: 394654324 [M::worker_pipeline::17.2908.45] mapped 834111 sequences [M::worker_pipeline::27.1245.75] mapped 258720 sequences [M::main] Version: 2.17-r974-dirty [M::main] CMD: minimap2 --cs -ax map-ont -t 70 /Drive4/nanopore_2nd_experiment/ncRNA_cdna/Homo_cdna.fa ./transcript/_processed.fasta [M::main] Real time: 27.366 sec; CPU: 156.142 sec; Peak RSS: 4.201 GB 2021-05-30 12:51:33: Alignment with minimap2 to reference genome [M::mm_idx_gen::61.0172.05] collected minimizers [M::mm_idx_gen::70.6275.51] sorted minimizers [WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix. [M::main::70.6275.51] loaded/built the index for 32 target sequence(s) [M::mm_mapopt_update::73.8635.31] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 32 [M::mm_idx_stat::76.0605.19] distinct minimizers: 167209768 (35.26% are singletons); average occurrences: 6.077; average spacing: 4.056; total length: 4121620856 [M::worker_pipeline::105.13217.54] mapped 834111 sequences [M::worker_pipeline::107.01517.25] mapped 258720 sequences [M::mm_idx_gen::126.46014.85] collected minimizers [M::mm_idx_gen::126.56114.86] sorted minimizers [M::main::126.56114.86] loaded/built the index for 34 target sequence(s) [M::mm_mapopt_update::126.56114.86] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 34 [M::mm_idx_stat::126.70814.84] distinct minimizers: 7040057 (63.57% are singletons); average occurrences: 2.114; average spacing: 274.178; total length: 4080473644 [M::worker_pipeline::142.14819.34] mapped 834111 sequences [M::worker_pipeline::143.74019.16] mapped 258720 sequences [M::mm_idx_gen::163.92417.00] collected minimizers [M::mm_idx_gen::163.96517.00] sorted minimizers [M::main::163.96517.00] loaded/built the index for 27 target sequence(s) [M::mm_mapopt_update::163.96517.00] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 27 [M::mm_idx_stat::164.00517.00] distinct minimizers: 3009320 (86.11% are singletons); average occurrences: 1.411; average spacing: 951.971; total length: 4042722680 [M::worker_pipeline::191.21822.41] mapped 834111 sequences [M::worker_pipeline::199.23521.55] mapped 258720 sequences [M::mm_idx_gen::218.75919.76] collected minimizers [M::mm_idx_gen::218.80119.76] sorted minimizers [M::main::218.80119.76] loaded/built the index for 27 target sequence(s) [M::mm_mapopt_update::218.80119.76] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 27 [M::mm_idx_stat::218.83819.76] distinct minimizers: 2422899 (88.08% are singletons); average occurrences: 1.371; average spacing: 1218.746; total length: 4047825866 [M::worker_pipeline::234.22122.29] mapped 834111 sequences [M::worker_pipeline::235.31822.21] mapped 258720 sequences [M::mm_idx_gen::254.54520.65] collected minimizers [M::mm_idx_gen::254.59320.65] sorted minimizers [M::main::254.59320.65] loaded/built the index for 30 target sequence(s) [M::mm_mapopt_update::254.59320.65] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 30 [M::mm_idx_stat::254.63520.64] distinct minimizers: 2201523 (87.94% are singletons); average occurrences: 1.360; average spacing: 1360.239; total length: 4071164743 [M::worker_pipeline::269.84822.62] mapped 834111 sequences [M::worker_pipeline::273.21722.35] mapped 258720 sequences [M::mm_idx_gen::293.84120.89] collected minimizers [M::mm_idx_gen::293.87820.89] sorted minimizers [M::main::293.87820.89] loaded/built the index for 31 target sequence(s) [M::mm_mapopt_update::293.87820.89] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 31 [M::mm_idx_stat::293.91420.89] distinct minimizers: 2127784 (92.47% are singletons); average occurrences: 1.243; average spacing: 1519.365; total length: 4019781606 [M::worker_pipeline::306.37222.34] mapped 834111 sequences [M::worker_pipeline::307.87922.23] mapped 258720 sequences [M::mm_idx_gen::327.82020.97] collected minimizers [M::mm_idx_gen::327.85420.98] sorted minimizers [M::main::327.85420.98] loaded/built the index for 28 target sequence(s) [M::mm_mapopt_update::327.85420.98] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 28 [M::mm_idx_stat::327.87520.97] distinct minimizers: 1353674 (86.80% are singletons); average occurrences: 1.500; average spacing: 1985.247; total length: 4032155065 [M::worker_pipeline::339.28322.01] mapped 834111 sequences [M::worker_pipeline::339.88621.97] mapped 258720 sequences [M::mm_idx_gen::360.92220.78] collected minimizers [M::mm_idx_gen::360.95320.78] sorted minimizers [M::main::360.95320.78] loaded/built the index for 34 target sequence(s) [M::mm_mapopt_update::360.95320.78] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 34 [M::mm_idx_stat::360.97220.78] distinct minimizers: 1490499 (90.78% are singletons); average occurrences: 1.483; average spacing: 1884.499; total length: 4166517919 [M::worker_pipeline::371.75721.68] mapped 834111 sequences [M::worker_pipeline::372.38821.65] mapped 258720 sequences [M::mm_idx_gen::392.46020.62] collected minimizers [M::mm_idx_gen::392.49120.62] sorted minimizers [M::main::392.49120.62] loaded/built the index for 28 target sequence(s) [M::mm_mapopt_update::392.49120.62] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 28 [M::mm_idx_stat::392.51120.62] distinct minimizers: 1371016 (90.53% are singletons); average occurrences: 1.245; average spacing: 2354.402; total length: 4017895798 [M::worker_pipeline::403.40021.36] mapped 834111 sequences [M::worker_pipeline::408.49221.11] mapped 258720 sequences [M::mm_idx_gen::428.20720.21] collected minimizers [M::mm_idx_gen::428.23820.21] sorted minimizers [M::main::428.23820.21] loaded/built the index for 36 target sequence(s) [M::mm_mapopt_update::428.23820.21] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 36 [M::mm_idx_stat::428.25820.21] distinct minimizers: 1137515 (88.41% are singletons); average occurrences: 1.828; average spacing: 2000.733; total length: 4160490061 [M::worker_pipeline::438.70621.02] mapped 834111 sequences [M::worker_pipeline::439.21221.00] mapped 258720 sequences [M::mm_idx_gen::459.01020.16] collected minimizers [M::mm_idx_gen::459.03920.16] sorted minimizers [M::main::459.03920.16] loaded/built the index for 29 target sequence(s) [M::mm_mapopt_update::459.03920.16] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 29 [M::mm_idx_stat::459.06020.16] distinct minimizers: 971232 (74.54% are singletons); average occurrences: 1.641; average spacing: 2518.335; total length: 4014818016 [M::worker_pipeline::467.96220.70] mapped 834111 sequences [M::worker_pipeline::468.51420.68] mapped 258720 sequences [M::mm_idx_gen::489.01319.87] collected minimizers [M::mm_idx_gen::489.04119.88] sorted minimizers [M::main::489.04119.88] loaded/built the index for 29 target sequence(s) [M::mm_mapopt_update::489.04119.88] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 29 [M::mm_idx_stat::489.06019.88] distinct minimizers: 1085952 (89.39% are singletons); average occurrences: 1.334; average spacing: 2764.143; total length: 4004787517 [M::worker_pipeline::498.83320.51] mapped 834111 sequences [M::worker_pipeline::499.25720.49] mapped 258720 sequences [M::mm_idx_gen::518.74419.78] collected minimizers [M::mm_idx_gen::518.77319.78] sorted minimizers [M::main::518.77319.78] loaded/built the index for 25 target sequence(s) [M::mm_mapopt_update::518.77319.78] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 25 [M::mm_idx_stat::518.79219.78] distinct minimizers: 903622 (91.47% are singletons); average occurrences: 1.210; average spacing: 3717.716; total length: 4065292668 [M::worker_pipeline::526.31120.20] mapped 834111 sequences [M::worker_pipeline::527.87820.14] mapped 258720 sequences [M::mm_idx_gen::547.65419.47] collected minimizers [M::mm_idx_gen::547.68319.47] sorted minimizers [M::main::547.68319.47] loaded/built the index for 31 target sequence(s) [M::mm_mapopt_update::547.68319.47] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 31 [M::mm_idx_stat::547.70019.47] distinct minimizers: 876617 (88.45% are singletons); average occurrences: 1.273; average spacing: 3650.923; total length: 4075269801 [M::worker_pipeline::555.21419.89] mapped 834111 sequences [M::worker_pipeline::555.56119.88] mapped 258720 sequences [M::mm_idx_gen::575.74619.24] collected minimizers [M::mm_idx_gen::575.77119.24] sorted minimizers [M::main::575.77119.24] loaded/built the index for 33 target sequence(s) [M::mm_mapopt_update::575.77119.24] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 33 [M::mm_idx_stat::575.78319.24] distinct minimizers: 663392 (92.63% are singletons); average occurrences: 1.221; average spacing: 5005.300; total length: 4053747445 [M::worker_pipeline::581.83319.52] mapped 834111 sequences [M::worker_pipeline::582.14319.51] mapped 258720 sequences [M::mm_idx_gen::593.02119.18] collected minimizers [M::mm_idx_gen::593.05719.18] sorted minimizers [M::main::593.05719.18] loaded/built the index for 185 target sequence(s) [M::mm_mapopt_update::593.05719.18] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 185 [M::mm_idx_stat::593.09719.18] distinct minimizers: 1792209 (66.14% are singletons); average occurrences: 2.181; average spacing: 555.888; total length: 2172634063 [M::worker_pipeline::618.72820.42] mapped 834111 sequences [M::worker_pipeline::628.311*20.12] mapped 258720 sequences [M::main] Version: 2.17-r974-dirty [M::main] CMD: minimap2 --cs -ax splice -t 70 /Drive1/human_index/Homo_sapiens.GRCh38.dna.toplevel.fa ./transcript/_processed.fasta [M::main] Real time: 628.541 sec; CPU: 12641.425 sec; Peak RSS: 34.554 GB 2021-05-30 13:02:02: Processing transcriptome alignment file: sam 2021-05-30 13:02:06: Processing genome alignment file: sam Traceback (most recent call last): File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 713, in main() File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 660, in main align_transcriptome(in_fasta, prefix, aligner, num_threads, t_alnm, ref_t, g_alnm, ref_g) File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 131, in align_transcriptome get_primary_sam.primary_and_unaligned(g_alnm, prefix + "_genome") File "/home/aclab/apps/NanoSim/src/get_primary_sam.py", line 89, in primary_and_unaligned in_sam_file = pysam.AlignmentFile(sam_alnm_file, 'r') File "pysam/libcalignmentfile.pyx", line 737, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 986, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False

running the code with following parameters:

infile /Drive7/2nd_nanopore_experiment_data/control_25march/20200316_1633_MN33429_FAM96501_e0563569/fastq_pass/control_T.fastq ref_g /Drive1/human_index/Homo_sapiens.GRCh38.dna.toplevel.fa ref_t /Drive4/nanopore_2nd_experiment/ncRNA_cdna/Homo_cdna.fa annot aligner minimap2 g_alnm t_alnm prefix ./transcript/control num_threads 70 model_fit True intron_retention False 2021-05-30 13:21:18: Read pre-process and unaligned reads analysis 2021-05-30 13:21:27: Alignment with minimap2 to reference transcriptome [M::mm_idx_gen::7.5861.52] collected minimizers [M::mm_idx_gen::8.2232.95] sorted minimizers [M::main::8.2582.94] loaded/built the index for 250156 target sequence(s) [M::mm_mapopt_update::8.7732.82] mid_occ = 142 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 250156 [M::mm_idx_stat::9.0062.78] distinct minimizers: 19770498 (41.38% are singletons); average occurrences: 3.693; average spacing: 5.405; total length: 394654324 [M::worker_pipeline::16.0098.71] mapped 834111 sequences [M::worker_pipeline::25.9845.75] mapped 258720 sequences [M::main] Version: 2.17-r974-dirty [M::main] CMD: minimap2 --cs -ax map-ont -t 70 /Drive4/nanopore_2nd_experiment/ncRNA_cdna/Homo_cdna.fa ./transcript/control_processed.fasta [M::main] Real time: 26.227 sec; CPU: 149.661 sec; Peak RSS: 4.148 GB 2021-05-30 13:21:53: Alignment with minimap2 to reference genome [M::mm_idx_gen::62.7402.03] collected minimizers [M::mm_idx_gen::72.6335.44] sorted minimizers [WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix. [M::main::72.6335.44] loaded/built the index for 32 target sequence(s) [M::mm_mapopt_update::76.9045.19] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 32 [M::mm_idx_stat::79.1745.07] distinct minimizers: 167209768 (35.26% are singletons); average occurrences: 6.077; average spacing: 4.056; total length: 4121620856 [M::worker_pipeline::105.07016.82] mapped 834111 sequences [M::worker_pipeline::108.24516.36] mapped 258720 sequences [M::mm_idx_gen::128.16614.06] collected minimizers [M::mm_idx_gen::128.26714.08] sorted minimizers [M::main::128.26714.08] loaded/built the index for 34 target sequence(s) [M::mm_mapopt_update::128.26714.08] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 34 [M::mm_idx_stat::128.41714.06] distinct minimizers: 7040057 (63.57% are singletons); average occurrences: 2.114; average spacing: 274.178; total length: 4080473644 [M::worker_pipeline::143.80018.45] mapped 834111 sequences [M::worker_pipeline::145.15918.29] mapped 258720 sequences [M::mm_idx_gen::164.96016.28] collected minimizers [M::mm_idx_gen::164.99816.29] sorted minimizers [M::main::164.99816.29] loaded/built the index for 27 target sequence(s) [M::mm_mapopt_update::164.99816.29] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 27 [M::mm_idx_stat::165.03916.28] distinct minimizers: 3009320 (86.11% are singletons); average occurrences: 1.411; average spacing: 951.971; total length: 4042722680 [M::worker_pipeline::191.42021.52] mapped 834111 sequences [M::worker_pipeline::198.97320.74] mapped 258720 sequences [M::mm_idx_gen::219.53218.94] collected minimizers [M::mm_idx_gen::219.56918.94] sorted minimizers [M::main::219.56918.94] loaded/built the index for 27 target sequence(s) [M::mm_mapopt_update::219.56918.94] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 27 [M::mm_idx_stat::219.60618.94] distinct minimizers: 2422899 (88.08% are singletons); average occurrences: 1.371; average spacing: 1218.746; total length: 4047825866 [M::worker_pipeline::235.78221.38] mapped 834111 sequences [M::worker_pipeline::236.68821.30] mapped 258720 sequences [M::mm_idx_gen::257.21019.72] collected minimizers [M::mm_idx_gen::257.24519.73] sorted minimizers [M::main::257.24519.73] loaded/built the index for 30 target sequence(s) [M::mm_mapopt_update::257.24519.73] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 30 [M::mm_idx_stat::257.28319.72] distinct minimizers: 2201523 (87.94% are singletons); average occurrences: 1.360; average spacing: 1360.239; total length: 4071164743 [M::worker_pipeline::270.89621.55] mapped 834111 sequences [M::worker_pipeline::274.52221.28] mapped 258720 sequences [M::mm_idx_gen::293.56220.00] collected minimizers [M::mm_idx_gen::293.59620.00] sorted minimizers [M::main::293.59620.00] loaded/built the index for 31 target sequence(s) [M::mm_mapopt_update::293.59620.00] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 31 [M::mm_idx_stat::293.63520.00] distinct minimizers: 2127784 (92.47% are singletons); average occurrences: 1.243; average spacing: 1519.365; total length: 4019781606 [M::worker_pipeline::306.09421.46] mapped 834111 sequences [M::worker_pipeline::307.57821.36] mapped 258720 sequences [M::mm_idx_gen::326.61220.20] collected minimizers [M::mm_idx_gen::326.64520.20] sorted minimizers [M::main::326.64520.20] loaded/built the index for 28 target sequence(s) [M::mm_mapopt_update::326.64520.20] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 28 [M::mm_idx_stat::326.66620.20] distinct minimizers: 1353674 (86.80% are singletons); average occurrences: 1.500; average spacing: 1985.247; total length: 4032155065 [M::worker_pipeline::337.87321.30] mapped 834111 sequences [M::worker_pipeline::338.46121.26] mapped 258720 sequences [M::mm_idx_gen::360.30320.07] collected minimizers [M::mm_idx_gen::360.33620.07] sorted minimizers [M::main::360.33620.07] loaded/built the index for 34 target sequence(s) [M::mm_mapopt_update::360.33620.07] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 34 [M::mm_idx_stat::360.35720.07] distinct minimizers: 1490499 (90.78% are singletons); average occurrences: 1.483; average spacing: 1884.499; total length: 4166517919 [M::worker_pipeline::370.68920.97] mapped 834111 sequences [M::worker_pipeline::371.28020.94] mapped 258720 sequences [M::mm_idx_gen::391.07119.96] collected minimizers [M::mm_idx_gen::391.10319.96] sorted minimizers [M::main::391.10319.96] loaded/built the index for 28 target sequence(s) [M::mm_mapopt_update::391.10319.96] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 28 [M::mm_idx_stat::391.12219.96] distinct minimizers: 1371016 (90.53% are singletons); average occurrences: 1.245; average spacing: 2354.402; total length: 4017895798 [M::worker_pipeline::401.56520.67] mapped 834111 sequences [M::worker_pipeline::406.74120.42] mapped 258720 sequences [M::mm_idx_gen::427.40919.51] collected minimizers [M::mm_idx_gen::427.44819.51] sorted minimizers [M::main::427.44819.51] loaded/built the index for 36 target sequence(s) [M::mm_mapopt_update::427.44819.51] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 36 [M::mm_idx_stat::427.46719.51] distinct minimizers: 1137515 (88.41% are singletons); average occurrences: 1.828; average spacing: 2000.733; total length: 4160490061 [M::worker_pipeline::437.25120.28] mapped 834111 sequences [M::worker_pipeline::437.73920.26] mapped 258720 sequences [M::mm_idx_gen::456.52419.49] collected minimizers [M::mm_idx_gen::456.55419.49] sorted minimizers [M::main::456.55419.49] loaded/built the index for 29 target sequence(s) [M::mm_mapopt_update::456.55419.49] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 29 [M::mm_idx_stat::456.57519.49] distinct minimizers: 971232 (74.54% are singletons); average occurrences: 1.641; average spacing: 2518.335; total length: 4014818016 [M::worker_pipeline::464.87120.03] mapped 834111 sequences [M::worker_pipeline::465.68220.00] mapped 258720 sequences [M::mm_idx_gen::485.65819.24] collected minimizers [M::mm_idx_gen::485.69019.24] sorted minimizers [M::main::485.69019.24] loaded/built the index for 29 target sequence(s) [M::mm_mapopt_update::485.69019.24] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 29 [M::mm_idx_stat::485.70919.24] distinct minimizers: 1085952 (89.39% are singletons); average occurrences: 1.334; average spacing: 2764.143; total length: 4004787517 [M::worker_pipeline::494.93219.78] mapped 834111 sequences [M::worker_pipeline::495.38019.77] mapped 258720 sequences [M::mm_idx_gen::515.89219.04] collected minimizers [M::mm_idx_gen::515.91819.04] sorted minimizers [M::main::515.91819.04] loaded/built the index for 25 target sequence(s) [M::mm_mapopt_update::515.91819.04] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 25 [M::mm_idx_stat::515.93619.04] distinct minimizers: 903622 (91.47% are singletons); average occurrences: 1.210; average spacing: 3717.716; total length: 4065292668 [M::worker_pipeline::523.07819.45] mapped 834111 sequences [M::worker_pipeline::524.65119.39] mapped 258720 sequences [M::mm_idx_gen::545.09718.72] collected minimizers [M::mm_idx_gen::545.12218.73] sorted minimizers [M::main::545.12218.73] loaded/built the index for 31 target sequence(s) [M::mm_mapopt_update::545.12218.73] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 31 [M::mm_idx_stat::545.13918.73] distinct minimizers: 876617 (88.45% are singletons); average occurrences: 1.273; average spacing: 3650.923; total length: 4075269801 [M::worker_pipeline::552.27819.10] mapped 834111 sequences [M::worker_pipeline::552.64619.09] mapped 258720 sequences [M::mm_idx_gen::572.71418.48] collected minimizers [M::mm_idx_gen::572.73818.48] sorted minimizers [M::main::572.73818.48] loaded/built the index for 33 target sequence(s) [M::mm_mapopt_update::572.73818.48] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 33 [M::mm_idx_stat::572.74918.48] distinct minimizers: 663392 (92.63% are singletons); average occurrences: 1.221; average spacing: 5005.300; total length: 4053747445 [M::worker_pipeline::578.36818.73] mapped 834111 sequences [M::worker_pipeline::578.69518.72] mapped 258720 sequences [M::mm_idx_gen::589.69318.40] collected minimizers [M::mm_idx_gen::589.72618.40] sorted minimizers [M::main::589.72618.40] loaded/built the index for 185 target sequence(s) [M::mm_mapopt_update::589.72618.40] mid_occ = 770 [M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 185 [M::mm_idx_stat::589.76618.40] distinct minimizers: 1792209 (66.14% are singletons); average occurrences: 2.181; average spacing: 555.888; total length: 2172634063 [M::worker_pipeline::614.86319.61] mapped 834111 sequences [M::worker_pipeline::624.639*19.32] mapped 258720 sequences [M::main] Version: 2.17-r974-dirty [M::main] CMD: minimap2 --cs -ax splice -t 70 /Drive1/human_index/Homo_sapiens.GRCh38.dna.toplevel.fa ./transcript/control_processed.fasta [M::main] Real time: 624.886 sec; CPU: 12069.882 sec; Peak RSS: 34.236 GB 2021-05-30 13:32:19: Processing transcriptome alignment file: sam 2021-05-30 13:32:22: Processing genome alignment file: sam Traceback (most recent call last): File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 713, in main() File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 660, in main align_transcriptome(in_fasta, prefix, aligner, num_threads, t_alnm, ref_t, g_alnm, ref_g) File "/home/aclab/apps/NanoSim/src/read_analysis.py", line 131, in align_transcriptome get_primary_sam.primary_and_unaligned(g_alnm, prefix + "_genome") File "/home/aclab/apps/NanoSim/src/get_primary_sam.py", line 89, in primary_and_unaligned in_sam_file = pysam.AlignmentFile(sam_alnm_file, 'r') File "pysam/libcalignmentfile.pyx", line 737, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 986, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False

Thanks in Advance

aman21392 commented 3 years ago

Thanks Chen, now it sort out. It is the genome file through which I got the error. I changed it and it working fine now.