Open timkahlke opened 6 years ago
Hi, sorry you are having problems with SMIS. Could you send me a bit more information on what kind of data you are trying to run ? and could you send me the whole output of the pipeline?
Thank you! Francesca
Hi Francesca,
I'm trying to run ~200k nanopore reads with average length ~6000 nucleotides to scaffold a 30MB draft genome.
Unfortunately, there is no more output about the error and, because smis_shred does not produce the two artificial fastq files the rest of the pipeline complains about not having those files (see below).
I also tried to run smis_shred stand-alone on mutliple files always with the same result. I thought it might be a compiler version problem but had the same problem with gcc4.2.1 (MAC), 4.4.7. and 4.9.4 (Centos6).
./mysmissv.sh: line 94: 134729 Segmentation fault $bindir/smis_shred -rlength $fakelen -step $step -minlen $minlen $fqfile fakemates_1.fastq fakemates_2.fastq >> $outp
[bwa_index] Pack FASTA... 0.22 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=64874730, availableWord=16564580
[BWTIncConstructFromPacked] 10 iterations done. 27323322 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 50475786 characters processed.
[bwt_gen] Finished constructing BWT in 27 iterations.
[bwa_index] 14.19 seconds elapse.
[bwa_index] Update BWT... 0.15 sec
[bwa_index] Pack forward-only FASTA... 0.13 sec
[bwa_index] Construct SA from BWT and Occ... 5.00 sec
[main] Version: 0.7.12-r1039
[main] CMD: /BWA_DIR/current/bwa index genome.fasta
[main] Real time: 20.695 sec; CPU: 19.696 sec
open: No such file or directory
[bam_sort_core] fail to open file bwa_sorted.bam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[gzclose] buffer error
[samopen] SAM header is present: 64 sequences.
[sam_read1] reference 'ID:bwa PN:bwa VN:0.7.12-r1039 CL:/BWA_DIR/current/bwa mem -t 25 -T 50 -A 2 -O -1 -E -1 -B -1 genome.fasta fakemates_1.fastq fakemates_2.fastq
' is recognized as '*'.
[main_samview] truncated file.
Starting at 1518640273
Could not open input BAM files.
wc: WORKING/DIR/smis_scaffolding/tempWork/genome-matepair-*: No such file or directory
in files.txt: smalt (bwa) data line should be <filename> <insert size> <standard deviation> <weight> <read length> <orientation, must = "in" | "out">
mv: cannot stat `spinner-sp2b.fasta': No such file or directory
Scaffolds are in spinner_scaffolds.fasta
Summary of parameters used are in /WORKING_DIR/smis_scaffolding/logs/launchedas_1518640243.txt
Log is in /PATH/TO/LOG/output_1518640243.txt
Hi, thanks for the details. In order to debug this problem, could you please reply to the questions below I will try to fix the problem as quickly as possible.
Also, if you have not done this, download and compile the version in the Sanger organization: https://github.com/wtsi-hpag/smis . We can continue the discussion here though.
Thank you, Francesca
Yep, all default parameters
I tried it with two files: read length of 300-120,000 and another one with read lengths 5,000-120,000
Read names are like this: 0b847c36-b4eb-46a3-a703-c5e44a7b75da
It created the files but both are empty.
I initially tried the version you pointed to but had the same problem. Also I couldn't add an issue on the other repo that's why I came here :)
Hi, sorry don't know why but github deleted my message from yesterday, maybe you received it? Anyways, I just mentioned yesterday that I added a test example with e.coli data on the https://github.com/wtsi-hpag/smis repository. Can you please update your repo and try the test? Hopefully this will tell us if there is a system issue or a data issue.
By the way, thanks for letting me know about the missing issues option on the organization repo, I think I fixed that.
Thank you
Hi Francesca
I am working with the latest version and I am encountering the same issue. Could you please help ? My command is
smis_pipeline -nodes 55 sample.fq sample.contigs.fasta sample_scaffolds.fasta
sample.contigs.fasta
has been generated from canu genome assembler. I want to mention that I have been working with a couple of samples of similar kind (same run) for which it worked without any issue. Any pointers would be helpful.
Here are answers for your queries
Ques I am assuming you are running with standard parameters, or are you manually setting any ?
Yes
Ques Could you tell me which is your shortest read and your longest read?
file format | type | num_seqs | sum_len | min_len | avg_len | max_len | |
---|---|---|---|---|---|---|---|
sample.fq | FASTQ | DNA | 232986 | 700531577 | 51 | 3006.8 | 46662 |
Ques Which is the typical read name ? (Just to check if the needed string length is longer than allowed now)
Here are those
m54079_180523_193451/23462106/29326_32162
m54079_180523_193451/23593208/35167_38003
m54079_180523_193451/24707440/74529_77365
m54079_180523_193451/25625293/61082_63918
m54079_180523_193451/26542980/37738_40574
m54079_180523_193451/28312032/37228_40064
m54079_180523_193451/30147290/16309_19145
m54079_180523_193451/30408936/11588_14424
m54079_180523_193451/30737165/23904_26740
m54079_180523_193451/31457850/23431_26267
Ques I understand smis_shred crashed, but did it started writing the fastq files fakemates_1.fastq fakemates_2.fastq or they don't even exist/are empty?
I could not find those files. Where can I find that?
Looking forward to hear from you soon.
Regard Vijay Lakhujani
Trying to run the new version but get
Segmentation fault $bindir/smis_shred -rlength $fakelen -step $step -minlen $minlen $fqfile fakemates_1.fastq fakemates_2.fastq >> $outp