Closed gaworj closed 1 year ago
Hi Jan,
Can you check if samtools is installed? Just with samtools --help.
If it isn’t installed, please install it with conda install samtools (if you are using conda).
I’m pretty sure this is the issue - I have just checked the bioconda recipe and realised that I forgot to add in samtools, so that would explain this error if you used bioconda, thanks for this issue - it should work fine once installed.
I will fix the bioconda recipe now for future versions (v0.1.5), And thanks for trying out plassembler!
George
Hi, George,
Thank you! I have followed your suggestion and checked wether the samtools was installed. Unfortunately not. After samtools installation in plassembler env everything works fine.
Bests, Jan
No problem @gaworj, thanks again for raising the issue - the issue you encountered should be fixed automatically in v0.1.5 (awaiting approval for bioconda, available from GitHub already). Another thing I added was --kmer_mode intended for high quality Nanopore reads (R10.4) without short reads as a bit of an experiment, so feel free to try that if you have such data (I don't yet!).
George
Sounds great!
Can you also add nano-raw and nanohq options for flye input? This would help people who are using older ont datastes. Another useful option will be the possibility to analyze (copy numer + plsdb search) user provided plasmid sequences that are already assembled.
Jan
Hi Jan,
I have added the functionality you suggested in the 0.2.0 branch if you want to try it - I'm still doing some tests before I merge it into the main branch. Great idea!
It takes -a flag to activate what I have called "assembled mode" and an -i input FASTA file. The file must contain the chromosome and plasmids. The chromosome contig header needs to be named "chromosome". Then it calculates the depth and runs PLSDB. Compatible with long-only or both long and short read input Fastqs.
Also, by default plassembler uses nanohq - if you want nano-raw use the -r flag.
George
@gaworj this is properly now available using in v1.0.0, which has been updated with many other changes too. You can calculate copy number based off long and/or short reads if you specify -a
, along with an existing chromosome assembly (--input_chromosome
) and plasmids --input_plasmids
).
Closing this issue now - but let me know what you think if you give it a go.
George
Hello,
Thanks for a very useful tool.
I have sucessfully installed plassembler but in my case the pipeline does not finish as expected.
When I try to run:
plassembler.py -d /home/data_HDD2/plassembler_db/ -l 4-LPC100_ont_1kb_q12.fastq.gz -1 4-LPC100_trim_R1.fastq.gz -2 4-LPC100_trim_R2.fastq.gz -c 3100000 --threads 40 -o 4-LPC100_plassembler -p 4-LPC100 Starting plassembler v0.1.4 Checking dependencies. Flye version found is v2.9.1-b1780. Flye version is ok. Unicycler version found is v0.5.0. Unicycler version is ok. Checking database installation. Database successfully checked. Checking input fastqs. FASTQ 4-LPC100_ont_1kb_q12.fastq.gz checked FASTQ 4-LPC100_trim_R1.fastq.gz checked FASTQ 4-LPC100_trim_R2.fastq.gz checked Filtering long reads. Running Flye. Counting Contigs. Flye assembled 3 contigs. More than one contig was assembled with Flye. Extracting Chromosome. Chromosome Identified. Plassembler will now use long and short reads to assemble plasmids accurately. Trimming short reads. Mapping Long Reads to Putative Plasmid Contigs. Mapping Long Reads to Chromosome. Mapping Short Reads to Putative Plasmid Contigs Mapping Short Reads to Chromosome Contig Processing Bams. Error with samtools view.
Here is the log file output:
2023-01-14 22:29:28,120 - INFO - Starting plassembler v0.1.4 2023-01-14 22:29:28,120 - INFO - Input args: Namespace(database='/home/data_HDD2/plassembler_db/', longreads='4-LPC100_ont_1kb_q12.fastq.gz', short_one='4-LPC100_trim_R1.fastq.gz', short_two='4-LPC100_trim_R2.fastq.gz', chromosome='3100000', outdir='4-LPC100_plassembler', min_length='500', threads='40', force=False, raw_flag=False, prefix='4-LPC100', min_quality='9') 2023-01-14 22:29:28,120 - INFO - Checking dependencies. 2023-01-14 22:29:28,200 - INFO - Flye version found is v2.9.1-b1780. 2023-01-14 22:29:28,200 - INFO - Flye version is ok. 2023-01-14 22:29:28,260 - INFO - Unicycler version found is v0.5.0. 2023-01-14 22:29:28,260 - INFO - Unicycler version is ok. 2023-01-14 22:29:28,260 - INFO - Checking database installation. 2023-01-14 22:29:28,260 - INFO - Database successfully checked. 2023-01-14 22:29:28,260 - INFO - Checking input fastqs 2023-01-14 22:29:28,266 - INFO - Filtering long reads. 2023-01-14 22:31:45,781 - INFO - Running Flye 2023-01-14 22:54:14,877 - INFO - Counting Contigs 2023-01-14 22:54:14,880 - INFO - More than one contig was assembled with Flye. 2023-01-14 22:54:14,880 - INFO - Extracting Chromosome. 2023-01-14 22:54:14,942 - INFO - Chromosome Identified. Plassembler will now use both long and short reads to assemble plasmids accurately. 2023-01-14 22:54:14,942 - INFO - Trimming short reads. 2023-01-14 22:54:21,360 - INFO - Read1 before filtering:
2023-01-14 22:54:21,360 - INFO - total reads: 485740
2023-01-14 22:54:21,360 - INFO - total bases: 104249485
2023-01-14 22:54:21,360 - INFO - Q20 bases: 102792034(98.602%)
2023-01-14 22:54:21,360 - INFO - Q30 bases: 98751562(94.7262%)
2023-01-14 22:54:21,360 - INFO -
2023-01-14 22:54:21,360 - INFO - Read2 before filtering:
2023-01-14 22:54:21,360 - INFO - total reads: 485740
2023-01-14 22:54:21,360 - INFO - total bases: 98536380
2023-01-14 22:54:21,360 - INFO - Q20 bases: 92691283(94.0681%)
2023-01-14 22:54:21,360 - INFO - Q30 bases: 83616899(84.8589%)
2023-01-14 22:54:21,360 - INFO -
2023-01-14 22:54:21,360 - INFO - Read1 after filtering:
2023-01-14 22:54:21,360 - INFO - total reads: 485739
2023-01-14 22:54:21,361 - INFO - total bases: 104249194
2023-01-14 22:54:21,361 - INFO - Q20 bases: 102791772(98.602%)
2023-01-14 22:54:21,361 - INFO - Q30 bases: 98751353(94.7263%)
2023-01-14 22:54:21,361 - INFO -
2023-01-14 22:54:21,361 - INFO - Read2 after filtering:
2023-01-14 22:54:21,361 - INFO - total reads: 485739
2023-01-14 22:54:21,361 - INFO - total bases: 98535962
2023-01-14 22:54:21,361 - INFO - Q20 bases: 92690991(94.0682%)
2023-01-14 22:54:21,361 - INFO - Q30 bases: 83616711(84.8591%)
2023-01-14 22:54:21,361 - INFO -
2023-01-14 22:54:21,361 - INFO - Filtering result:
2023-01-14 22:54:21,361 - INFO - reads passed filter: 971478
2023-01-14 22:54:21,361 - INFO - reads failed due to low quality: 2
2023-01-14 22:54:21,361 - INFO - reads failed due to too many N: 0
2023-01-14 22:54:21,361 - INFO - reads failed due to too short: 0
2023-01-14 22:54:21,361 - INFO - reads with adapter trimmed: 54
2023-01-14 22:54:21,361 - INFO - bases trimmed due to adapters: 555
2023-01-14 22:54:21,361 - INFO -
2023-01-14 22:54:21,361 - INFO - Duplication rate: 0.0430271%
2023-01-14 22:54:21,361 - INFO -
2023-01-14 22:54:21,361 - INFO - Insert size peak (evaluated by paired-end reads): 152
2023-01-14 22:54:21,463 - INFO -
2023-01-14 22:54:21,463 - INFO - JSON report: fastp.json
2023-01-14 22:54:21,463 - INFO - HTML report: fastp.html
2023-01-14 22:54:21,463 - INFO -
2023-01-14 22:54:21,463 - INFO - fastp --in1 4-LPC100_trim_R1.fastq.gz --in2 4-LPC100_trim_R2.fastq.gz --out1 4-LPC100_plassembler/trimmed_R1.fastq --out2 4-LPC100_plassembler/trimmed_R2.fastq
2023-01-14 22:54:21,463 - INFO - fastp v0.23.2, time used: 7 seconds
2023-01-14 22:54:22,473 - INFO - Mapping Long Reads to Putative Plasmid Contigs. 2023-01-14 22:54:22,478 - INFO - [M::mm_idx_gen::0.004*1.34] collected minimizers
2023-01-14 22:54:22,480 - INFO - [M::mm_idx_gen::0.006*3.85] sorted minimizers
2023-01-14 22:54:22,480 - INFO - [M::main::0.006*3.83] loaded/built the index for 2 target sequence(s)
2023-01-14 22:54:22,481 - INFO - [M::mm_mapopt_update::0.007*3.56] mid_occ = 10
2023-01-14 22:54:22,481 - INFO - [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
2023-01-14 22:54:22,481 - INFO - [M::mm_idx_stat::0.007*3.37] distinct minimizers: 13345 (81.22% are singletons); average occurrences: 1.199; average spacing: 5.321; total length: 85103
2023-01-14 22:54:35,391 - INFO - [M::worker_pipeline::12.917*21.07] mapped 28247 sequences
2023-01-14 22:54:35,397 - INFO - [M::main] Version: 2.24-r1122
2023-01-14 22:54:35,397 - INFO - [M::main] CMD: minimap2 -ax map-ont -t 40 4-LPC100_plassembler/non_chromosome.fasta 4-LPC100_plassembler/filtered_long_reads.fastq.gz
2023-01-14 22:54:35,397 - INFO - [M::main] Real time: 12.923 sec; CPU: 272.115 sec; Peak RSS: 6.733 GB
2023-01-14 22:54:35,438 - INFO - Mapping Long Reads to Chromosome. 2023-01-14 22:54:35,543 - INFO - [M::mm_idx_gen::0.103*1.01] collected minimizers
2023-01-14 22:54:35,553 - INFO - [M::mm_idx_gen::0.114*2.34] sorted minimizers
2023-01-14 22:54:35,553 - INFO - [M::main::0.114*2.34] loaded/built the index for 1 target sequence(s)
2023-01-14 22:54:35,563 - INFO - [M::mm_mapopt_update::0.124*2.23] mid_occ = 14
2023-01-14 22:54:35,563 - INFO - [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
2023-01-14 22:54:35,569 - INFO - [M::mm_idx_stat::0.129*2.18] distinct minimizers: 552383 (98.32% are singletons); average occurrences: 1.042; average spacing: 5.341; total length: 3075531
2023-01-14 22:54:53,428 - INFO - [M::worker_pipeline::17.989*25.96] mapped 28247 sequences
2023-01-14 22:54:53,443 - INFO - [M::main] Version: 2.24-r1122
2023-01-14 22:54:53,443 - INFO - [M::main] CMD: minimap2 -ax map-ont -t 40 4-LPC100_plassembler/chromosome.fasta 4-LPC100_plassembler/filtered_long_reads.fastq.gz
2023-01-14 22:54:53,443 - INFO - [M::main] Real time: 18.004 sec; CPU: 466.944 sec; Peak RSS: 1.955 GB
2023-01-14 22:54:53,547 - INFO - Mapping Short Reads to Putative Plasmid Contigs 2023-01-14 22:54:53,549 - INFO - [M::bwa_idx_load_from_disk] read 0 ALT contigs
2023-01-14 22:54:54,740 - INFO - [M::process] read 971478 sequences (202785156 bp)...
2023-01-14 22:55:00,134 - INFO - [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (24, 32818, 20, 34)
2023-01-14 22:55:00,134 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation FF...
2023-01-14 22:55:00,134 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (1736, 2854, 7435)
2023-01-14 22:55:00,134 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 18833)
2023-01-14 22:55:00,135 - INFO - [M::mem_pestat] mean and std.dev: (3654.58, 2785.73)
2023-01-14 22:55:00,135 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 24532)
2023-01-14 22:55:00,135 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation FR...
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (147, 225, 353)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 765)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] mean and std.dev: (261.55, 152.01)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 971)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation RF...
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (2315, 4318, 8360)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 20450)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] mean and std.dev: (4504.20, 3212.77)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 26495)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation RR...
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (2714, 3258, 5732)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 11768)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] mean and std.dev: (3952.65, 2152.64)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 14786)
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] skip orientation FF
2023-01-14 22:55:00,136 - INFO - [M::mem_pestat] skip orientation RF
2023-01-14 22:55:00,137 - INFO - [M::mem_pestat] skip orientation RR
2023-01-14 22:55:00,761 - INFO - [M::mem_process_seqs] Processed 971478 reads in 224.370 CPU sec, 6.021 real sec
2023-01-14 22:55:02,356 - INFO - [main] Version: 0.7.17-r1188
2023-01-14 22:55:02,356 - INFO - [main] CMD: bwa mem -t 40 4-LPC100_plassembler/non_chromosome.fasta 4-LPC100_plassembler/trimmed_R1.fastq 4-LPC100_plassembler/trimmed_R2.fastq
2023-01-14 22:55:02,357 - INFO - [main] Real time: 8.807 sec; CPU: 226.545 sec
2023-01-14 22:55:02,428 - INFO - Mapping Short Reads to Chromosome Contig 2023-01-14 22:55:02,432 - INFO - [M::bwa_idx_load_from_disk] read 0 ALT contigs
2023-01-14 22:55:03,633 - INFO - [M::process] read 971478 sequences (202785156 bp)...
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (81, 370991, 73, 75)
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation FF...
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (706, 2055, 6071)
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16801)
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] mean and std.dev: (3331.37, 2969.51)
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 22166)
2023-01-14 22:55:09,273 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation FR...
2023-01-14 22:55:09,289 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (154, 231, 363)
2023-01-14 22:55:09,289 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 781)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] mean and std.dev: (270.88, 154.43)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 990)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation RF...
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (977, 4053, 6587)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 17807)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] mean and std.dev: (4142.62, 3190.58)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 23417)
2023-01-14 22:55:09,290 - INFO - [M::mem_pestat] analyzing insert size distribution for orientation RR...
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] (25, 50, 75) percentile: (1898, 4776, 7990)
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 20174)
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] mean and std.dev: (4663.32, 3244.67)
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] low and high boundaries for proper pairs: (1, 26266)
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] skip orientation FF
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] skip orientation RF
2023-01-14 22:55:09,291 - INFO - [M::mem_pestat] skip orientation RR
2023-01-14 22:55:10,613 - INFO - [M::mem_process_seqs] Processed 971478 reads in 262.535 CPU sec, 6.980 real sec
2023-01-14 22:55:12,467 - INFO - [main] Version: 0.7.17-r1188
2023-01-14 22:55:12,468 - INFO - [main] CMD: bwa mem -t 40 4-LPC100_plassembler/chromosome.fasta 4-LPC100_plassembler/trimmed_R1.fastq 4-LPC100_plassembler/trimmed_R2.fastq
2023-01-14 22:55:12,468 - INFO - [main] Real time: 10.038 sec; CPU: 264.910 sec
2023-01-14 22:55:12,542 - INFO - Processing Bams.
I have tried to use it on my recent projects where small plasmids were identified using various assemblers and plassembler everytime stops at this stage.
Any hints?
Bests, Jan