EichlerLab / smrtsv2

Structural variant caller
MIT License
53 stars 6 forks source link

Failed assembly step #15

Closed bioysu closed 5 years ago

bioysu commented 5 years ago

I tried to run smrtsv2 assembly as following: /data/suyao/tools/pacbio/smrtsv2/smrtsv assemble --asm-cpu 8 --asm-polish arrow it failed, here is the output: Running local assemblies Building DAG of jobs... Using shell: /bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 20 asm_assemble_group 1 asm_assemble_group_bam_list 1 asm_merge_groups 22

[Fri Mar 1 18:06:26 2019] rule asm_assemble_group: input: align/alignments.fofn, detect/candidate_groups.bed, detect/candidates.bed, reference/ref.fasta, reference/ref.fasta.fai, reference/ref.fasta.sa output: assemble/group/gr-chrV-0-576874/contig.bam, assemble/group/gr-chrV-0-576874/contig.bam.bai log: assemble/group/gr-chrV-0-576874/contig_group.log jobid: 2 benchmark: assemble/group/gr-chrV-0-576874/contig_group_bm.log wildcards: group_id=gr-chrV-0-576874

Job counts: count jobs 1 asm_assemble_group 1 [Fri Mar 1 18:08:14 2019] Error in rule asm_assemble_group: jobid: 0 output: assemble/group/gr-chrV-0-576874/contig.bam, assemble/group/gr-chrV-0-576874/contig.bam.bai log: assemble/group/gr-chrV-0-576874/contig_group.log

RuleException: RuntimeError in line 162 of /data/suyao/tools/pacbio/smrtsv2/rules/assemble.snakefile: Failed to assemble group gr-chrV-0-576874: See log assemble/group/gr-chrV-0-576874/contig_group.log File "/data/suyao/tools/pacbio/smrtsv2/rules/assemble.snakefile", line 162, in __rule_asm_assemble_group File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /data/suyao/tools/pacbio/smrtsv2/test/.snakemake/log/2019-03-01T180626.588554.snakemake.log

and here is the output of assemble/group/gr-chrV-0-576874/contig_group.log: Assembling group: gr-chrV-0-576874 Assemble temp: /tmp/asm_group_gr-chrV-0-576874 Hostname: localhost.localdomain Threads: 8 Memory: 1G Polish: arrow MAPQ: 30

Building DAG of jobs... Using shell: /bin/bash Provided cores: 8 Rules claiming more threads will be scaled down. Provided resources: threads=8 Job counts: count jobs 1 asm_group_get_reads 31 asm_group_get_region_bam 31 asm_group_reads_to_fasta 31 assemble_align_fixup 31 assemble_align_org 31 assemble_align_ref_region 31 assemble_get_ref_region 31 assemble_polish 31 assemble_reads 31 assemble_set_pb_seq_name 1 merge_group_contigs 281

[Fri Mar 1 18:06:28 2019] rule asm_group_get_reads: input: /data/suyao/tools/pacbio/smrtsv2/test/align/alignments.fofn output: group/reads.bam, group/reads.bam.bai jobid: 32 resources: threads=1

Job counts: count jobs 1 asm_group_get_reads 1 Extracting reads from batch 0... Extracting reads from batch 1... Extracting reads from batch 2... Extracting reads from batch 3... Extracting reads from batch 4... Extracting reads from batch 5... Extracting reads from batch 6... Extracting reads from batch 7... Extracting reads from batch 8... Extracting reads from batch 9... Merging batches... Extracting over region: chrV:1-576874 [Fri Mar 1 18:06:38 2019] Finished job 32. 1 of 281 steps (0.36%) done

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-360000-60000/reads/reads.bam, region/chrV-360000-60000/reads/reads.bam.bai jobid: 273 wildcards: region_id=chrV-360000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-420000-60000/reads/reads.bam, region/chrV-420000-60000/reads/reads.bam.bai jobid: 265 wildcards: region_id=chrV-420000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-0-60000/reads/reads.bam, region/chrV-0-60000/reads/reads.bam.bai jobid: 264 wildcards: region_id=chrV-0-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-180000-60000/reads/reads.bam, region/chrV-180000-60000/reads/reads.bam.bai jobid: 267 wildcards: region_id=chrV-180000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-140000-60000/reads/reads.bam, region/chrV-140000-60000/reads/reads.bam.bai jobid: 251 wildcards: region_id=chrV-140000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-240000-60000/reads/reads.bam, region/chrV-240000-60000/reads/reads.bam.bai jobid: 272 wildcards: region_id=chrV-240000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-380000-60000/reads/reads.bam, region/chrV-380000-60000/reads/reads.bam.bai jobid: 260 wildcards: region_id=chrV-380000-60000 resources: threads=1

[Fri Mar 1 18:06:38 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-40000-60000/reads/reads.bam, region/chrV-40000-60000/reads/reads.bam.bai jobid: 276 wildcards: region_id=chrV-40000-60000 resources: threads=1

Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Extracting over region: chrV:380001-440000 Extracting over region: chrV:420001-480000 Extracting over region: chrV:360001-420000 Extracting over region: chrV:1-60000 [Fri Mar 1 18:06:41 2019] Finished job 265. 2 of 281 steps (0.71%) done Extracting over region: chrV:240001-300000 Extracting over region: chrV:180001-240000 Extracting over region: chrV:40001-100000 Extracting over region: chrV:140001-200000 [Fri Mar 1 18:06:41 2019] Finished job 260. 3 of 281 steps (1%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-380000-60000/reads/reads.bam, region/chrV-380000-60000/reads/reads.bam.bai output: region/chrV-380000-60000/reads/reads.fasta, region/chrV-380000-60000/reads/reads.fastq jobid: 209 wildcards: region_id=chrV-380000-60000 resources: threads=1

[Fri Mar 1 18:06:41 2019] Finished job 273. 4 of 281 steps (1%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-420000-60000/reads/reads.bam, region/chrV-420000-60000/reads/reads.bam.bai output: region/chrV-420000-60000/reads/reads.fasta, region/chrV-420000-60000/reads/reads.fastq jobid: 218 wildcards: region_id=chrV-420000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-380000-60000/reads/reads.fasta Writing FASTQ: region/chrV-380000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-420000-60000/reads/reads.fasta Writing FASTQ: region/chrV-420000-60000/reads/reads.fastq [Fri Mar 1 18:06:41 2019] Finished job 264. 5 of 281 steps (2%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-360000-60000/reads/reads.bam, region/chrV-360000-60000/reads/reads.bam.bai output: region/chrV-360000-60000/reads/reads.fasta, region/chrV-360000-60000/reads/reads.fastq jobid: 234 wildcards: region_id=chrV-360000-60000 resources: threads=1

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-0-60000/reads/reads.bam, region/chrV-0-60000/reads/reads.bam.bai output: region/chrV-0-60000/reads/reads.fasta, region/chrV-0-60000/reads/reads.fastq jobid: 216 wildcards: region_id=chrV-0-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-360000-60000/reads/reads.fasta Writing FASTQ: region/chrV-360000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-0-60000/reads/reads.fasta Writing FASTQ: region/chrV-0-60000/reads/reads.fastq [Fri Mar 1 18:06:41 2019] Finished job 272. 6 of 281 steps (2%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-240000-60000/reads/reads.bam, region/chrV-240000-60000/reads/reads.bam.bai output: region/chrV-240000-60000/reads/reads.fasta, region/chrV-240000-60000/reads/reads.fastq jobid: 233 wildcards: region_id=chrV-240000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-240000-60000/reads/reads.fasta Writing FASTQ: region/chrV-240000-60000/reads/reads.fastq [Fri Mar 1 18:06:41 2019] Finished job 267. 7 of 281 steps (2%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-180000-60000/reads/reads.bam, region/chrV-180000-60000/reads/reads.bam.bai output: region/chrV-180000-60000/reads/reads.fasta, region/chrV-180000-60000/reads/reads.fastq jobid: 222 wildcards: region_id=chrV-180000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-180000-60000/reads/reads.fasta Writing FASTQ: region/chrV-180000-60000/reads/reads.fastq [Fri Mar 1 18:06:41 2019] Finished job 276. 8 of 281 steps (3%) done [Fri Mar 1 18:06:41 2019] Finished job 251. 9 of 281 steps (3%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-40000-60000/reads/reads.bam, region/chrV-40000-60000/reads/reads.bam.bai output: region/chrV-40000-60000/reads/reads.fasta, region/chrV-40000-60000/reads/reads.fastq jobid: 240 wildcards: region_id=chrV-40000-60000 resources: threads=1

[Fri Mar 1 18:06:41 2019] rule asm_group_reads_to_fasta: input: region/chrV-140000-60000/reads/reads.bam, region/chrV-140000-60000/reads/reads.bam.bai output: region/chrV-140000-60000/reads/reads.fasta, region/chrV-140000-60000/reads/reads.fastq jobid: 191 wildcards: region_id=chrV-140000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-40000-60000/reads/reads.fasta Writing FASTQ: region/chrV-40000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-140000-60000/reads/reads.fasta Writing FASTQ: region/chrV-140000-60000/reads/reads.fastq Removing temporary output file region/chrV-380000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-380000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 209. 10 of 281 steps (4%) done Removing temporary output file region/chrV-420000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-420000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 218. 11 of 281 steps (4%) done

[Fri Mar 1 18:06:41 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-60000-60000/reads/reads.bam, region/chrV-60000-60000/reads/reads.bam.bai jobid: 254 wildcards: region_id=chrV-60000-60000 resources: threads=1

[Fri Mar 1 18:06:41 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-520000-56874/reads/reads.bam, region/chrV-520000-56874/reads/reads.bam.bai jobid: 252 wildcards: region_id=chrV-520000-56874 resources: threads=1

Removing temporary output file region/chrV-360000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-360000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 234. 12 of 281 steps (4%) done Removing temporary output file region/chrV-0-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-0-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 216. 13 of 281 steps (5%) done Removing temporary output file region/chrV-180000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-180000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 222. 14 of 281 steps (5%) done Removing temporary output file region/chrV-140000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-140000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 191. 15 of 281 steps (5%) done Removing temporary output file region/chrV-40000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-40000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 240. 16 of 281 steps (6%) done Removing temporary output file region/chrV-240000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-240000-60000/reads/reads.fastq. [Fri Mar 1 18:06:41 2019] Finished job 233. 17 of 281 steps (6%) done

[Fri Mar 1 18:06:42 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-80000-60000/reads/reads.bam, region/chrV-80000-60000/reads/reads.bam.bai jobid: 270 wildcards: region_id=chrV-80000-60000 resources: threads=1

[Fri Mar 1 18:06:42 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-460000-60000/reads/reads.bam, region/chrV-460000-60000/reads/reads.bam.bai jobid: 253 wildcards: region_id=chrV-460000-60000 resources: threads=1

[Fri Mar 1 18:06:42 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-160000-60000/reads/reads.bam, region/chrV-160000-60000/reads/reads.bam.bai jobid: 258 wildcards: region_id=chrV-160000-60000 resources: threads=1

[Fri Mar 1 18:06:42 2019] Job counts: count jobs 1 asm_group_get_region_bam 1 rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-340000-60000/reads/reads.bam, region/chrV-340000-60000/reads/reads.bam.bai jobid: 274 wildcards: region_id=chrV-340000-60000 resources: threads=1

Job counts: count jobs 1 asm_group_get_region_bam 1 [Fri Mar 1 18:06:42 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-560000-16874/reads/reads.bam, region/chrV-560000-16874/reads/reads.bam.bai jobid: 250 wildcards: region_id=chrV-560000-16874 resources: threads=1

[Fri Mar 1 18:06:42 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-20000-60000/reads/reads.bam, region/chrV-20000-60000/reads/reads.bam.bai jobid: 268 wildcards: region_id=chrV-20000-60000 resources: threads=1

Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Extracting over region: chrV:60001-120000 Extracting over region: chrV:520001-576874 [Fri Mar 1 18:06:43 2019] Finished job 252. 18 of 281 steps (6%) done

[Fri Mar 1 18:06:43 2019] rule asm_group_reads_to_fasta: input: region/chrV-520000-56874/reads/reads.bam, region/chrV-520000-56874/reads/reads.bam.bai output: region/chrV-520000-56874/reads/reads.fasta, region/chrV-520000-56874/reads/reads.fastq jobid: 193 wildcards: region_id=chrV-520000-56874 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-520000-56874/reads/reads.fasta Writing FASTQ: region/chrV-520000-56874/reads/reads.fastq [Fri Mar 1 18:06:43 2019] Finished job 254. 19 of 281 steps (7%) done

[Fri Mar 1 18:06:43 2019] rule asm_group_reads_to_fasta: input: region/chrV-60000-60000/reads/reads.bam, region/chrV-60000-60000/reads/reads.bam.bai output: region/chrV-60000-60000/reads/reads.fasta, region/chrV-60000-60000/reads/reads.fastq jobid: 197 wildcards: region_id=chrV-60000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-60000-60000/reads/reads.fasta Writing FASTQ: region/chrV-60000-60000/reads/reads.fastq Extracting over region: chrV:560001-576874 [Fri Mar 1 18:06:43 2019] Finished job 250. 20 of 281 steps (7%) done

[Fri Mar 1 18:06:43 2019] rule asm_group_reads_to_fasta: input: region/chrV-560000-16874/reads/reads.bam, region/chrV-560000-16874/reads/reads.bam.bai output: region/chrV-560000-16874/reads/reads.fasta, region/chrV-560000-16874/reads/reads.fastq jobid: 189 wildcards: region_id=chrV-560000-16874 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-560000-16874/reads/reads.fasta Writing FASTQ: region/chrV-560000-16874/reads/reads.fastq Extracting over region: chrV:460001-520000 [Fri Mar 1 18:06:44 2019] Finished job 253. 21 of 281 steps (7%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_reads_to_fasta: input: region/chrV-460000-60000/reads/reads.bam, region/chrV-460000-60000/reads/reads.bam.bai output: region/chrV-460000-60000/reads/reads.fasta, region/chrV-460000-60000/reads/reads.fastq jobid: 194 wildcards: region_id=chrV-460000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-460000-60000/reads/reads.fasta Writing FASTQ: region/chrV-460000-60000/reads/reads.fastq Extracting over region: chrV:20001-80000 Extracting over region: chrV:340001-400000 Extracting over region: chrV:160001-220000 Extracting over region: chrV:80001-140000 Removing temporary output file region/chrV-560000-16874/reads/reads.bam.bai. Removing temporary output file region/chrV-560000-16874/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 189. 22 of 281 steps (8%) done [Fri Mar 1 18:06:44 2019] Finished job 274. 23 of 281 steps (8%) done [Fri Mar 1 18:06:44 2019] Finished job 270. 24 of 281 steps (9%) done [Fri Mar 1 18:06:44 2019] Finished job 258. 25 of 281 steps (9%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-280000-60000/reads/reads.bam, region/chrV-280000-60000/reads/reads.bam.bai jobid: 271 wildcards: region_id=chrV-280000-60000 resources: threads=1

[Fri Mar 1 18:06:44 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-120000-60000/reads/reads.bam, region/chrV-120000-60000/reads/reads.bam.bai jobid: 269 wildcards: region_id=chrV-120000-60000 resources: threads=1

[Fri Mar 1 18:06:44 2019] rule asm_group_reads_to_fasta: input: region/chrV-340000-60000/reads/reads.bam, region/chrV-340000-60000/reads/reads.bam.bai output: region/chrV-340000-60000/reads/reads.fasta, region/chrV-340000-60000/reads/reads.fastq jobid: 236 wildcards: region_id=chrV-340000-60000 resources: threads=1

[Fri Mar 1 18:06:44 2019] Finished job 268. 26 of 281 steps (9%) done

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-340000-60000/reads/reads.fasta Writing FASTQ: region/chrV-340000-60000/reads/reads.fastq

[Fri Mar 1 18:06:44 2019] rule asm_group_reads_to_fasta: input: region/chrV-80000-60000/reads/reads.bam, region/chrV-80000-60000/reads/reads.bam.bai output: region/chrV-80000-60000/reads/reads.fasta, region/chrV-80000-60000/reads/reads.fastq jobid: 229 wildcards: region_id=chrV-80000-60000 resources: threads=1

[Fri Mar 1 18:06:44 2019] rule asm_group_reads_to_fasta: input: region/chrV-160000-60000/reads/reads.bam, region/chrV-160000-60000/reads/reads.bam.bai output: region/chrV-160000-60000/reads/reads.fasta, region/chrV-160000-60000/reads/reads.fastq jobid: 205 wildcards: region_id=chrV-160000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-80000-60000/reads/reads.fasta Writing FASTQ: region/chrV-80000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-160000-60000/reads/reads.fasta Writing FASTQ: region/chrV-160000-60000/reads/reads.fastq Removing temporary output file region/chrV-520000-56874/reads/reads.bam.bai. Removing temporary output file region/chrV-520000-56874/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 193. 27 of 281 steps (10%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_reads_to_fasta: input: region/chrV-20000-60000/reads/reads.bam, region/chrV-20000-60000/reads/reads.bam.bai output: region/chrV-20000-60000/reads/reads.fasta, region/chrV-20000-60000/reads/reads.fastq jobid: 225 wildcards: region_id=chrV-20000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-20000-60000/reads/reads.fasta Writing FASTQ: region/chrV-20000-60000/reads/reads.fastq Removing temporary output file region/chrV-60000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-60000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 197. 28 of 281 steps (10%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-260000-60000/reads/reads.bam, region/chrV-260000-60000/reads/reads.bam.bai jobid: 255 wildcards: region_id=chrV-260000-60000 resources: threads=1

Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Removing temporary output file region/chrV-460000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-460000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 194. 29 of 281 steps (10%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-500000-60000/reads/reads.bam, region/chrV-500000-60000/reads/reads.bam.bai jobid: 278 wildcards: region_id=chrV-500000-60000 resources: threads=1

Removing temporary output file region/chrV-340000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-340000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 236. 30 of 281 steps (11%) done

[Fri Mar 1 18:06:44 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-200000-60000/reads/reads.bam, region/chrV-200000-60000/reads/reads.bam.bai jobid: 275 wildcards: region_id=chrV-200000-60000 resources: threads=1

Removing temporary output file region/chrV-80000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-80000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 229. 31 of 281 steps (11%) done Removing temporary output file region/chrV-160000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-160000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 205. 32 of 281 steps (11%) done Removing temporary output file region/chrV-20000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-20000-60000/reads/reads.fastq. [Fri Mar 1 18:06:44 2019] Finished job 225. 33 of 281 steps (12%) done Job counts: count jobs 1 asm_group_get_region_bam 1

[Fri Mar 1 18:06:45 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-320000-60000/reads/reads.bam, region/chrV-320000-60000/reads/reads.bam.bai jobid: 277 wildcards: region_id=chrV-320000-60000 resources: threads=1

[Fri Mar 1 18:06:45 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-300000-60000/reads/reads.bam, region/chrV-300000-60000/reads/reads.bam.bai jobid: 257 wildcards: region_id=chrV-300000-60000 resources: threads=1

[Fri Mar 1 18:06:45 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-400000-60000/reads/reads.bam, region/chrV-400000-60000/reads/reads.bam.bai jobid: 279 wildcards: region_id=chrV-400000-60000 resources: threads=1

Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Extracting over region: chrV:280001-340000 Extracting over region: chrV:120001-180000 Extracting over region: chrV:260001-320000 [Fri Mar 1 18:06:46 2019] Finished job 271. 34 of 281 steps (12%) done

[Fri Mar 1 18:06:46 2019] rule asm_group_reads_to_fasta: input: region/chrV-280000-60000/reads/reads.bam, region/chrV-280000-60000/reads/reads.bam.bai output: region/chrV-280000-60000/reads/reads.fasta, region/chrV-280000-60000/reads/reads.fastq jobid: 231 wildcards: region_id=chrV-280000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-280000-60000/reads/reads.fasta Writing FASTQ: region/chrV-280000-60000/reads/reads.fastq [Fri Mar 1 18:06:46 2019] Finished job 255. 35 of 281 steps (12%) done [Fri Mar 1 18:06:46 2019] Finished job 269. 36 of 281 steps (13%) done

[Fri Mar 1 18:06:46 2019] rule asm_group_reads_to_fasta: input: region/chrV-260000-60000/reads/reads.bam, region/chrV-260000-60000/reads/reads.bam.bai output: region/chrV-260000-60000/reads/reads.fasta, region/chrV-260000-60000/reads/reads.fastq jobid: 199 wildcards: region_id=chrV-260000-60000 resources: threads=1

[Fri Mar 1 18:06:46 2019] rule asm_group_reads_to_fasta: input: region/chrV-120000-60000/reads/reads.bam, region/chrV-120000-60000/reads/reads.bam.bai output: region/chrV-120000-60000/reads/reads.fasta, region/chrV-120000-60000/reads/reads.fastq jobid: 227 wildcards: region_id=chrV-120000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-260000-60000/reads/reads.fasta Writing FASTQ: region/chrV-260000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-120000-60000/reads/reads.fasta Writing FASTQ: region/chrV-120000-60000/reads/reads.fastq Removing temporary output file region/chrV-280000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-280000-60000/reads/reads.fastq. [Fri Mar 1 18:06:47 2019] Finished job 231. 37 of 281 steps (13%) done Removing temporary output file region/chrV-120000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-120000-60000/reads/reads.fastq. [Fri Mar 1 18:06:47 2019] Finished job 227. 38 of 281 steps (14%) done Removing temporary output file region/chrV-260000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-260000-60000/reads/reads.fastq. [Fri Mar 1 18:06:47 2019] Finished job 199. 39 of 281 steps (14%) done

[Fri Mar 1 18:06:47 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-220000-60000/reads/reads.bam, region/chrV-220000-60000/reads/reads.bam.bai jobid: 266 wildcards: region_id=chrV-220000-60000 resources: threads=1

[Fri Mar 1 18:06:47 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-100000-60000/reads/reads.bam, region/chrV-100000-60000/reads/reads.bam.bai jobid: 259 wildcards: region_id=chrV-100000-60000 resources: threads=1

[Fri Mar 1 18:06:47 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-0-10500/reads/reads.bam, region/chrV-0-10500/reads/reads.bam.bai jobid: 261 wildcards: region_id=chrV-0-10500 resources: threads=1

Extracting over region: chrV:400001-460000 Extracting over region: chrV:300001-360000 Extracting over region: chrV:320001-380000 Extracting over region: chrV:500001-560000 Extracting over region: chrV:200001-260000 [Fri Mar 1 18:06:47 2019] Finished job 277. 40 of 281 steps (14%) done

[Fri Mar 1 18:06:47 2019] rule asm_group_reads_to_fasta: input: region/chrV-320000-60000/reads/reads.bam, region/chrV-320000-60000/reads/reads.bam.bai output: region/chrV-320000-60000/reads/reads.fasta, region/chrV-320000-60000/reads/reads.fastq jobid: 242 wildcards: region_id=chrV-320000-60000 resources: threads=1

[Fri Mar 1 18:06:47 2019] Finished job 279. 41 of 281 steps (15%) done

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-320000-60000/reads/reads.fasta Writing FASTQ: region/chrV-320000-60000/reads/reads.fastq

[Fri Mar 1 18:06:47 2019] rule asm_group_reads_to_fasta: input: region/chrV-400000-60000/reads/reads.bam, region/chrV-400000-60000/reads/reads.bam.bai output: region/chrV-400000-60000/reads/reads.fasta, region/chrV-400000-60000/reads/reads.fastq jobid: 247 wildcards: region_id=chrV-400000-60000 resources: threads=1

[Fri Mar 1 18:06:47 2019] Finished job 278. 42 of 281 steps (15%) done

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-400000-60000/reads/reads.fasta Writing FASTQ: region/chrV-400000-60000/reads/reads.fastq [Fri Mar 1 18:06:47 2019] Finished job 257. 43 of 281 steps (15%) done

[Fri Mar 1 18:06:47 2019] rule asm_group_reads_to_fasta: input: region/chrV-300000-60000/reads/reads.bam, region/chrV-300000-60000/reads/reads.bam.bai output: region/chrV-300000-60000/reads/reads.fasta, region/chrV-300000-60000/reads/reads.fastq jobid: 202 wildcards: region_id=chrV-300000-60000 resources: threads=1

[Fri Mar 1 18:06:47 2019] rule asm_group_reads_to_fasta: input: region/chrV-500000-60000/reads/reads.bam, region/chrV-500000-60000/reads/reads.bam.bai output: region/chrV-500000-60000/reads/reads.fasta, region/chrV-500000-60000/reads/reads.fastq jobid: 245 wildcards: region_id=chrV-500000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-300000-60000/reads/reads.fasta Writing FASTQ: region/chrV-300000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-500000-60000/reads/reads.fasta Writing FASTQ: region/chrV-500000-60000/reads/reads.fastq [Fri Mar 1 18:06:47 2019] Finished job 275. 44 of 281 steps (16%) done

[Fri Mar 1 18:06:47 2019] rule asm_group_reads_to_fasta: input: region/chrV-200000-60000/reads/reads.bam, region/chrV-200000-60000/reads/reads.bam.bai output: region/chrV-200000-60000/reads/reads.fasta, region/chrV-200000-60000/reads/reads.fastq jobid: 238 wildcards: region_id=chrV-200000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-200000-60000/reads/reads.fasta Writing FASTQ: region/chrV-200000-60000/reads/reads.fastq Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Removing temporary output file region/chrV-400000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-400000-60000/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 247. 45 of 281 steps (16%) done Extracting over region: chrV:1-10500 Removing temporary output file region/chrV-500000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-500000-60000/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 245. 46 of 281 steps (16%) done

[Fri Mar 1 18:06:48 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-440000-60000/reads/reads.bam, region/chrV-440000-60000/reads/reads.bam.bai jobid: 262 wildcards: region_id=chrV-440000-60000 resources: threads=1

[Fri Mar 1 18:06:48 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-540000-36874/reads/reads.bam, region/chrV-540000-36874/reads/reads.bam.bai jobid: 263 wildcards: region_id=chrV-540000-36874 resources: threads=1

Removing temporary output file region/chrV-320000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-320000-60000/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 242. 47 of 281 steps (17%) done Removing temporary output file region/chrV-300000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-300000-60000/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 202. 48 of 281 steps (17%) done [Fri Mar 1 18:06:48 2019] Finished job 261. 49 of 281 steps (17%) done Removing temporary output file region/chrV-200000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-200000-60000/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 238. 50 of 281 steps (18%) done

[Fri Mar 1 18:06:48 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-480000-60000/reads/reads.bam, region/chrV-480000-60000/reads/reads.bam.bai jobid: 256 wildcards: region_id=chrV-480000-60000 resources: threads=1

[Fri Mar 1 18:06:48 2019] rule asm_group_reads_to_fasta: input: region/chrV-0-10500/reads/reads.bam, region/chrV-0-10500/reads/reads.bam.bai output: region/chrV-0-10500/reads/reads.fasta, region/chrV-0-10500/reads/reads.fastq jobid: 210 wildcards: region_id=chrV-0-10500 resources: threads=1

[Fri Mar 1 18:06:48 2019] rule asm_group_get_region_bam: input: group/reads.bam, group/reads.bam.bai output: region/chrV-433000-20500/reads/reads.bam, region/chrV-433000-20500/reads/reads.bam.bai jobid: 280 wildcards: region_id=chrV-433000-20500 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-0-10500/reads/reads.fasta Writing FASTQ: region/chrV-0-10500/reads/reads.fastq Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 Removing temporary output file region/chrV-0-10500/reads/reads.bam.bai. Removing temporary output file region/chrV-0-10500/reads/reads.fastq. [Fri Mar 1 18:06:48 2019] Finished job 210. 51 of 281 steps (18%) done Extracting over region: chrV:100001-160000 [Fri Mar 1 18:06:48 2019] Finished job 259. 52 of 281 steps (19%) done Extracting over region: chrV:220001-280000 Job counts: count jobs 1 asm_group_get_region_bam 1 Job counts: count jobs 1 asm_group_get_region_bam 1 [Fri Mar 1 18:06:48 2019] Finished job 266. 53 of 281 steps (19%) done

[Fri Mar 1 18:06:48 2019] rule asm_group_reads_to_fasta: input: region/chrV-100000-60000/reads/reads.bam, region/chrV-100000-60000/reads/reads.bam.bai output: region/chrV-100000-60000/reads/reads.fasta, region/chrV-100000-60000/reads/reads.fastq jobid: 207 wildcards: region_id=chrV-100000-60000 resources: threads=1

[Fri Mar 1 18:06:48 2019] rule asm_group_reads_to_fasta: input: region/chrV-220000-60000/reads/reads.bam, region/chrV-220000-60000/reads/reads.bam.bai output: region/chrV-220000-60000/reads/reads.fasta, region/chrV-220000-60000/reads/reads.fastq jobid: 221 wildcards: region_id=chrV-220000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-100000-60000/reads/reads.fasta Writing FASTQ: region/chrV-100000-60000/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-220000-60000/reads/reads.fasta Writing FASTQ: region/chrV-220000-60000/reads/reads.fastq Extracting over region: chrV:433001-453500 [Fri Mar 1 18:06:49 2019] Finished job 280. 54 of 281 steps (19%) done

[Fri Mar 1 18:06:49 2019] rule asm_group_reads_to_fasta: input: region/chrV-433000-20500/reads/reads.bam, region/chrV-433000-20500/reads/reads.bam.bai output: region/chrV-433000-20500/reads/reads.fasta, region/chrV-433000-20500/reads/reads.fastq jobid: 248 wildcards: region_id=chrV-433000-20500 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-433000-20500/reads/reads.fasta Writing FASTQ: region/chrV-433000-20500/reads/reads.fastq Extracting over region: chrV:440001-500000 Extracting over region: chrV:540001-576874 [Fri Mar 1 18:06:49 2019] Finished job 262. 55 of 281 steps (20%) done [Fri Mar 1 18:06:49 2019] Finished job 263. 56 of 281 steps (20%) done

[Fri Mar 1 18:06:49 2019] rule asm_group_reads_to_fasta: input: region/chrV-540000-36874/reads/reads.bam, region/chrV-540000-36874/reads/reads.bam.bai output: region/chrV-540000-36874/reads/reads.fasta, region/chrV-540000-36874/reads/reads.fastq jobid: 215 wildcards: region_id=chrV-540000-36874 resources: threads=1

[Fri Mar 1 18:06:49 2019] rule asm_group_reads_to_fasta: input: region/chrV-440000-60000/reads/reads.bam, region/chrV-440000-60000/reads/reads.bam.bai output: region/chrV-440000-60000/reads/reads.fasta, region/chrV-440000-60000/reads/reads.fastq jobid: 213 wildcards: region_id=chrV-440000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-540000-36874/reads/reads.fasta Writing FASTQ: region/chrV-540000-36874/reads/reads.fastq

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-440000-60000/reads/reads.fasta Writing FASTQ: region/chrV-440000-60000/reads/reads.fastq Removing temporary output file region/chrV-100000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-100000-60000/reads/reads.fastq. [Fri Mar 1 18:06:49 2019] Finished job 207. 57 of 281 steps (20%) done Removing temporary output file region/chrV-220000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-220000-60000/reads/reads.fastq. [Fri Mar 1 18:06:49 2019] Finished job 221. 58 of 281 steps (21%) done Extracting over region: chrV:480001-540000 [Fri Mar 1 18:06:49 2019] Finished job 256. 59 of 281 steps (21%) done

[Fri Mar 1 18:06:49 2019] rule asm_group_reads_to_fasta: input: region/chrV-480000-60000/reads/reads.bam, region/chrV-480000-60000/reads/reads.bam.bai output: region/chrV-480000-60000/reads/reads.fasta, region/chrV-480000-60000/reads/reads.fastq jobid: 201 wildcards: region_id=chrV-480000-60000 resources: threads=1

Entering: asm_group_get_region_bam

Writing FASTA: region/chrV-480000-60000/reads/reads.fasta Writing FASTQ: region/chrV-480000-60000/reads/reads.fastq Removing temporary output file region/chrV-433000-20500/reads/reads.bam.bai. Removing temporary output file region/chrV-433000-20500/reads/reads.fastq. [Fri Mar 1 18:06:49 2019] Finished job 248. 60 of 281 steps (21%) done Removing temporary output file region/chrV-540000-36874/reads/reads.bam.bai. Removing temporary output file region/chrV-540000-36874/reads/reads.fastq. [Fri Mar 1 18:06:49 2019] Finished job 215. 61 of 281 steps (22%) done Removing temporary output file region/chrV-440000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-440000-60000/reads/reads.fastq. [Fri Mar 1 18:06:50 2019] Finished job 213. 62 of 281 steps (22%) done Removing temporary output file region/chrV-480000-60000/reads/reads.bam.bai. Removing temporary output file region/chrV-480000-60000/reads/reads.fastq. [Fri Mar 1 18:06:50 2019] Finished job 201. 63 of 281 steps (22%) done

[Fri Mar 1 18:06:50 2019] rule assemble_reads: input: region/chrV-220000-60000/reads/reads.fasta output: region/chrV-220000-60000/asm/contigs.fasta, region/chrV-220000-60000/asm/contigs.fasta.fai, region/chrV-220000-60000/asm/corrected_reads.fastq.gz log: /data/suyao/tools/pacbio/smrtsv2/test/assemble/group/gr-chrV-0-576874/log/chrV-220000-60000.log jobid: 159 wildcards: region_id=chrV-220000-60000 resources: threads=8

Job counts: count jobs 1 assemble_reads 1 Removing temporary output file region/chrV-220000-60000/reads/reads.fasta. Removing temporary output file region/chrV-220000-60000/asm/corrected_reads.fastq.gz. [Fri Mar 1 18:08:10 2019] Finished job 159. 64 of 281 steps (23%) done

[Fri Mar 1 18:08:10 2019] rule assemble_align_org: input: region/chrV-220000-60000/asm/contigs.fasta, region/chrV-220000-60000/reads/reads.bam output: region/chrV-220000-60000/asm/contig_aligned_reads.bam, region/chrV-220000-60000/asm/contig_aligned_reads.bam.pbi jobid: 220 wildcards: region_id=chrV-220000-60000 resources: threads=8

Job counts: count jobs 1 assemble_align_org 1 [INFO] 2019-03-01T18:08:10 [blasr] started. [INFO] 2019-03-01T18:08:12 [blasr] ended. Removing temporary output file region/chrV-220000-60000/reads/reads.bam. [Fri Mar 1 18:08:13 2019] Finished job 220. 65 of 281 steps (23%) done

[Fri Mar 1 18:08:13 2019] rule assemble_polish: input: region/chrV-220000-60000/asm/contigs.fasta, region/chrV-220000-60000/asm/contigs.fasta.fai, region/chrV-220000-60000/asm/contig_aligned_reads.bam, region/chrV-220000-60000/asm/contig_aligned_reads.bam.pbi output: region/chrV-220000-60000/asm/contigs_polished.fasta jobid: 158 wildcards: region_id=chrV-220000-60000 resources: threads=8

Job counts: count jobs 1 assemble_polish 1 This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program. Traceback (most recent call last): File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner return_code = exe_main_func(*args, *kwargs) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner return tr.main() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 259, in main if options.algorithm == "arrow" and peekFile.isCmpH5: File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2409, in isCmpH5 res = self._pollResources(lambda x: isinstance(x, CmpH5Reader)) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1863, in _pollResources return [func(resource) for resource in self.resourceReaders()] File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2923, in resourceReaders self._openFiles() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2075, in _openFiles resource = IndexedBamReader(location) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 374, in init super(IndexedBamReader, self).init(fname, referenceFastaFname) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 183, in init self._loadReadGroupInfo() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 94, in _loadReadGroupInfo raise IOError("This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program.") IOError: This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program. [ERROR] This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program. Traceback (most recent call last): File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner return_code = exe_main_func(args, **kwargs) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner return tr.main() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/GenomicConsensus/main.py", line 259, in main if options.algorithm == "arrow" and peekFile.isCmpH5: File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2409, in isCmpH5 res = self._pollResources(lambda x: isinstance(x, CmpH5Reader)) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1863, in _pollResources return [func(resource) for resource in self.resourceReaders()] File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2923, in resourceReaders self._openFiles() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2075, in _openFiles resource = IndexedBamReader(location) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 374, in init super(IndexedBamReader, self).init(fname, referenceFastaFname) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 183, in init self._loadReadGroupInfo() File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 94, in _loadReadGroupInfo raise IOError("This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program.") IOError: This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program. [Fri Mar 1 18:08:14 2019] Error in rule assemble_polish:  jobid: 0  output: region/chrV-220000-60000/asm/contigs_polished.fasta  RuleException: CalledProcessError in line 250 of /data/suyao/tools/pacbio/smrtsv2/rules/assemble_group.snakefile: Command ' set -euo pipefail; variantCaller --referenceFilename region/chrV-220000-60000/asm/contigs.fasta region/chrV-220000-60000/asm/contig_aligned_reads.bam -o region/chrV-220000-60000/asm/contigs_polished.fasta -j 8 --algorithm=arrow; ' returned non-zero exit status 1. File "/data/suyao/tools/pacbio/smrtsv2/rules/assemble_group.snakefile", line 250, in __rule_assemble_polish File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /tmp/asm_group_gr-chrV-0-576874/.snakemake/log/2019-03-01T180627.513361.snakemake.log

Could anyone help me?

paudano commented 5 years ago

It's failing while polishing the assemblies. The error is:

This does not appear to be a valid PacBio BAM file. Only datasets from RS II and Sequel instruments are supported by this program.

What is your input data? Is it PacBio or ONT reads? What chemistry? Arrow (part of the PacBio GenomicConsensus package) is very picky about its input because it uses a model trained on data with certain expectations.

bioysu commented 5 years ago

I download PacBio reads of yeast as following:

List of AWS-hosted files from PacBio including raw reads and an HGAP assembly.

wget https://gist.githubusercontent.com/pb-jchin/6359919/raw/9c172c7ff7cbc0193ce89e715215ce912f3f30e6/gistfile1.txt

Keep only .xml, .bas.h5, and .bax.h5 files.

sed '/fasta/d;/fastq/d;/celera/d;/HGAP/d' gistfile1.txt > gistfile1.keep.txt

Download data into a raw reads directory.

mkdir -p raw_reads cd raw_reads for f in cat ../gistfile1.keep.txt; do wget --force-directories $f; done

Create a list of reads for analysis.

cd .. find ./raw_reads -name "*.bax.h5" -exec readlink -f {} \; > reads.fofn

paudano commented 5 years ago

I think one of two things are happening. Either the sequence data is so old that current versions of arrow are not trained on it (sequenced in 2013), or some information in lost aligning directly from the BAX files.

The first thing I would try is setting the polishing algorithm to quiver instead of arrow (--asm-polish quiver). Arrow should work on RS II data, but this is worth a try.

The other thing you could try is converting the bax files to bam files. You would need bax2bam. For each cell, run all three .bax.h5 files through bax2bam. It will generate one .subreads.bam and one .scraps.bam file for the whole cell. Place the .subreads.bam files in an FOFN and run SMRT-SV from that.

Can you try either of these things and see if it makes a difference?

jsirott commented 5 years ago

I ran into the same problem as @bioysu a few days ago and was able to get the pipeline to work with a subset of the same yeast data with the following changes to the previous message:

1) Run bax2bam with the --allowUnrecognizedChemistryTriple option 2) Use quiver instead of arrow (arrow fails with an incorrect chemistry exception) 3) Make sure that you set the batch size to be the same as the number of .bam files if there are less than 8 .bam files, or an exception is thrown.

paudano commented 5 years ago

Sounds like that should fix the problem with the data. Thanks @spammy123 for verifying this for me.

SMRT-SV should not be crashing if there are more batches than BAMs. I am running a dataset that will test this, so I’ll fix it if I see it.

I am going to close this out for now.

paudano commented 5 years ago

The batching problem is fixed. Empty batch FOFN files in align/batches/ were getting a single newline character. To fix where batches were already created, delete the single newline from those 1-byte files.

jsirott commented 5 years ago

Thanks! One other small problem I ran into: variants_bed_to_vcy.py failed when the bed file was empty (no inversions in my small test case). Here's a patch that just creates an empty output file:

--- a/scripts/call/variants_bed_to_vcf.py
+++ b/scripts/call/variants_bed_to_vcf.py
@@ -30,6 +30,11 @@ def convert_bed_to_vcf(bed_filename, reference_filename, vcf_filename, sample, v
         raise Exception("Unsupported variant type: %s" % variant_type)

     calls = pd.read_table(bed_filename, header=None, usecols=columns, names=names)
+    if len(calls) == 0:
+        # Just create an empty output file
+        open(vcf_filename,'w').close()
+        return
+
     calls["sample_name"] = sample
     calls["call_id"] = "."
     calls["quality"] = calls.apply(calculate_variant_quality, axis=1)
bioysu commented 5 years ago

I tried to run smrtsv2 as suggested by @spammy123, but it failed at assembly step again:

Here is the error: Error in rule asm_assemble_group: jobid: 0 output: assemble/group/gr-chrM-0-85779/contig.bam, assemble/group/gr-chrM-0-85779/contig.bam.bai log: assemble/group/gr-chrM-0-85779/contig_group.log

RuleException: RuntimeError in line 162 of /data/suyao/tools/pacbio/smrtsv2/rules/assemble.snakefile: Failed to assemble group gr-chrM-0-85779: See log assemble/group/gr-chrM-0-85779/contig_group.log File "/data/suyao/tools/pacbio/smrtsv2/rules/assemble.snakefile", line 162, in __rule_asm_assemble_group File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /data/suyao/tools/pacbio/smrtsv2/test4/.snakemake/log/2019-03-08T051111.896694.snakemake.log

Here is error in assemble/group/gr-chrM-0-85779/contig_group.log:

File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 81, in _loadReadGroupInfo rgID = rgAsInt(rg["ID"]) File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/pacbio/lib/python2.7/site-packages/pbcore/io/align/_BamSupport.py", line 58, in rgAsInt return np.int32(int(rgIdString, 16)) ValueError: invalid literal for int() with base 16: 'fdc5820d-2C95FF94' ESC[32m[Fri Mar 8 06:33:13 2019]ESC[0m ESC[31mError in rule assemble_polish:ESC[0m ESC[31m jobid: 0ESC[0m ESC[31m output: region/chrM-20000-60000/asm/contigs_polished.fastaESC[0m ESC[31mESC[0m ESC[31mRuleException: CalledProcessError in line 250 of /data/suyao/tools/pacbio/smrtsv2/rules/assemble_group.snakefile: Command ' set -euo pipefail; variantCaller --referenceFilename region/chrM-20000-60000/asm/contigs.fasta region/chrM-20000-60000/asm/contig_aligned_reads.bam -o region/chrM-20000-60000/asm/contigs_polished.fasta -j 3 --algorithm=quiver; ' returned non-zero exit status 2. File "/data/suyao/tools/pacbio/smrtsv2/rules/assemble_group.snakefile", line 250, in __rule_assemble_polish File "/data/suyao/tools/pacbio/smrtsv2/dep/conda/build/envs/python3/lib/python3.6/concurrent/futures/thread.py", line 55, in runESC[0m ESC[31mExiting because a job execution failed. Look above for error messageESC[0m Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /tmp/asm_group_gr-chrM-0-85779/.snakemake/log/2019-03-08T051409.499125.snakemake.log

paudano commented 5 years ago

I have seen variantCaller fail with a similar error before, and it was because the version was too far away form the sequence data.

ValueError: invalid literal for int() with base 16: 'fdc5820d-2C95FF94'"

In my case, I needed to update variantCaller, but in your case, I think it needs to be downgraded, and probably to a version that's not in Conda. Fixing this is likely going to take more time than running a full sample.

An alternative would be to align a full sample, and after "detect" runs, edit detect/candidate_groups.bed and remove all but a small number of groups. That will push it through assemblies pretty quickly.

When I can spend some time on variants_bed_to_vcy.py, I'll test you solution. Thanks!

paudano commented 5 years ago

Also, it might be necessary to run them through bax2bam. I think we are having a similar problem in ticket #17.

paudano commented 5 years ago

It is definitely necessary to run from subread BAM files. I just pushed an update for this (code and documentation).