TGAC / RAMPART

A configurable de novo assembly pipeline
http://www.tgac.ac.uk/rampart/
GNU General Public License v3.0
28 stars 7 forks source link

Failing on quake.py with 0.12.2 #26

Open jerowe opened 9 years ago

jerowe commented 9 years ago

Running rampart 0.12.2

quake.py 0.3.5 kat 2.0.6 jellyfish 2.0.6

The command:

cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}

Fails with this error:

terminate called after throwing an instance of 'jellyfish::fastq_seq_qual_parser::FastqSeqQualParserError'
  what():  Truncated input file
Error: Requires at least 2 arguments.
Usage: jellyfish merge [options] db:string+
Use --help for more information
Traceback (most recent call last):
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 324, in <module>
    main()
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 89, in main
    jellyfish(options.readsf, options.reads_listf, options.k, ctsf, quality_scale, options.hash_size, options.proc)
  File "/share/apps/NYUAD/quake/gcc_4.9.1/0.3.5/bin/quake.py", line 290, in jellyfish
    os.rename('%s.dbm_0' % output_pre, '%s.dbm' % output_pre)
OSError: [Errno 2] No such file or directory

Log File with --verbose looks like this:

2015-10-27 12:10:58 INFO  DefaultProcessService:146 - Running command in foreground [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].
2015-10-27 12:12:28 DEBUG ProcessRunner:153 - Return code was '0' for [cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}].  Redirecting stderr.

Why doesn't the output from quake.py reflect in the log file?

I am running rampart in an unscheduled environment. Is this one of those errors that would be fixed by running with PBS?

maplesond commented 9 years ago

Hi Jillian,

Not 100% sure about this one. It shouldn't have anything to do with running unscheduled, or with PBS. Sometimes quake will fail if the input data doesn't have enough coverage, however it should work fine on the example dataset and I think the error messages for that are different. A couple of things to try:

First, does quake.py run outside of RAMPART? If not check the quake installation guide. Second, I have quake 0.3.4 installed on my system. Maybe dropping to that version may help? If this does fix the issue I should make a bug fix my side.

jerowe commented 9 years ago

Hi Dan,

I think it has something to do with not running jellyfish beforehand. I believe I was able to get past this before by running the command manually with --no_jelly, but I'm not sure. I'll give that a whirl and we'll see.

I am starting 100% fresh from the ecoli data, so if there is any preprocessing that should be done prior to running rampart, it would be great to know that. ;)

jerowe commented 9 years ago

I can confirm that when I run with --no_jelly it runs as expected.

cd ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli; quake.py --no_jelly -f ${BASE_DIR}/rampart_out/1_mecq/quake/ecoli/readsListFile.lst -k 17 -p 8 -q 33 2>&1; cd ${BASE_DIR}

Output:

Processing sequences...
...............15451936 sequences processed, 1545193600 bp scanned
WARNING: Input had 171844 non-DNA (ACGT) characters whose kmers were not counted
23137901 total distinct mers
23137901 mers occur at least 0 times
initial  value 82217.600310 
iter  10 value 73825.591030
iter  20 value 72472.587900
iter  30 value 72125.377214
iter  40 value 72114.277322
final  value 72113.140761 
converged
value: 72113.14 
$zp.copy
[1] 2.316599

$p.e
[1] 0.7837952

$shape.e
[1] 0.4637115

$scale.e
[1] 1.654814

$u.v
[1] 157.7489

$var.v
[1] 1345.886

Cutoff: 10.79
10119368 trusted kmers
AT% = 0.493509
/scratch/jillian/workflows/rampart-0.12.2/rampart_out/1_mecq/quake/ecoli/DRR015910_1.fastq
/scratch/jillian/workflows/rampart-0.12.2/rampart_out/1_mecq/quake/ecoli/DRR015910_2.fastq
Uneven number of reads in paired end read files .DRR015910_1.fastq/0 and .DRR015910_2.fastq/0
jerowe commented 9 years ago

Hi Dan,

As it turns out the ecoli data didn't download completely. I redownloaded it, and now it runs up to the mass operation, where it exists with exit code 2.

2015-11-01 13:31:17 INFO MassJob:178 - Finished MASS group: "spades" 2015-11-01 13:31:17 ERROR Mass:156 - MASS job "abyss-quake" for sample "rampart_out" did not produce any output files 2015-11-01 13:31:17 ERROR AbstractConanTask:255 - Process 'MASS' failed to execute, exit code: 2 2015-11-01 13:31:17 ERROR AbstractConanTask:257 - Execution exception follows uk.ac.ebi.fgpt.conan.service.exception.ProcessExecutionException: java.io.IOException: Stage MASS failed to produce valid output. at uk.ac.tgac.rampart.stage.RampartProcess.execute(RampartProcess.java:187)

I'm going back now and running each step individually. Hopefully I will have more information for you soon!

jerowe commented 9 years ago

I notice in the log file I see that abyss-quake fails, but not on which command it fails.