Closed af8 closed 3 years ago
seems like there is something wrong with r2c.bam could you check if there is alignments where the read sequence is not outputted:
samtools view LIB00002320_S14-r2c.bam | awk '$10=="*"'
just suspecting that the soft-clipped alignments do not have read sequences kept in the bam file. If that's the case, adding -Y
to the minimap2 r2c step will solve the problem
Well spotted. The problem is indeed that r2c.bam file is empty. The error was not seen before because I still got a successful exit code (0) even though minimap2 failed. I have broken down the Makefile to transform it in a nextflow pipeline and I forgot to put set -o pipefail
before the call to the piped sequence so this is my bad.
The problem was the same as the one reported in issue #12 (even though -f
was supplied to samtools command) :
[M::mm_idx_gen::77.962*1.58] collected minimizers
[M::mm_idx_gen::82.369*2.48] sorted minimizers
[WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix.
[M::main::82.369*2.48] loaded/built the index for 1820985 target sequence(s)
[M::mm_mapopt_update::82.369*2.48] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 1820985
[M::mm_idx_stat::84.495*2.44] distinct minimizers: 142973084 (47.36% are singletons); average occurrences: 4.628; average spacing: 6.045
[E::sam_parse1] no SQ lines present in the header
[W::sam_read1] Parse error at line 2
samtools view: error reading file "-"
Reading this issue from minimap2 repository, it looks that providing more memory to minimap2 through the -I
option can fix the problem as minimap2 will be able to index all the sequence in memory in one shot.
I have almost 5M sequences in the file "transcripts.filtered.fa" and minimap2 is able to load/index approximately 1M with the default 4GB memory used. I have now re-launched with 80GB memory (certainly too much ;-)) minimap2 -ax sr -t20 -I80g ...
and it should solve the problem. I will keep you updated in this thread.
Maybe you should give users the possibility to customise the memory provided to minimap2 as you do for samtools sort.
Great, let me know how it goes, I can add a memory variable for r2c if -I
takes care of the issue
Will close the issue for now unless there is more feedback
Yes, adding -I
option to minimap2 with enough memory solved the problem. For fasta file with large number of sequences --split_prefix
is no longer needed by minimap2 to process the whole sample.
Hi @readmanchiu
I launched fusionbloom on a large sample (~400M 2*151bp pairs) and I got an error at the last step (pavfinder) :
The error message was :
I do not really understand this message. Do you have any hints to provide and is there any easy turnaround ?
Also, in the logs there are a lot of complains about missing BAM indexes. Should something be done about it ? (although in other samples this has not been an issue).
The only difference I can see with other samples (that completed successfully) is that this one is very large (3 times more reads).
Many thanks for your help, Anthony