Open tseemann opened 8 years ago
I think the bug is that you do not support read files with path in them?
So ecoli.fastq.gz
works, but not /path/to/the/reads/ecoli.fastq.gz
?
As you can see, LightAssembler supports the path to read files. Thanks!
The path suggestion was just one idea I had.
Can you suggest any other reasons why we are unable to get any results with your software?
Can you give me the exact command line that you are using for your dataset?
I believe @jfroula was having the same problem running the software here at the JGI. Jeff, perhaps you could outline the problem you were having, if you have your code samples at hand?
My experience is that this appears to be related to the -G
flag. If the value is not set to an accurate value. I've found using 10x the anticipated value appears to make this error go away. Assuming we're describing the same error cause. LightAssembler appears to generate the same error message, regardless of the cause.
Sorry for late reply,
@michaelbarton, The value of -G
flag, the genome size, should be relatively accurate because it plays a key role in determining the size of Bloom filter, its false positive rate, which affects trusted/untrusted kmers filtering step of LightAssembler (i.e. LightAssembler results).
I tried different genome size values for GAGE Staphylococcus_aureus (genome size: 2903081 bp) to see the effect of genome size values on the assembly results.
(genome size: 1803081 bp)
(genome size: 1103081 bp)
LightAssembler generates a general message if it fails to assemble the given data set saying some suggestions that cause the failure such as read length, gap size or kmer size. I will also mention that the genome size value should be relatively accurate in this generated message. I sent an email to @jfroula to know his issues with LightAssembler so I can fix them.
Thank you so much.
Thanks for following up. I believe it may not be possible to have an accurate estimate of the genome size ahead of time, for example when assembling a novel genome for the first time. It can be possible to approximate size from the observation rate of unique kmers when sampling from the reads however this could be error prone if light assembler is particularly sensitive to this value.
I can't seem to assemble my data, 5 Mbp bacterial genome, PE reads. I've tried various
k
andg
etc.