Closed lexnederbragt closed 9 years ago
If you use the latest check-in in the master branch, you should have a file called fc_run.log
. Can you show me the file so I can guess what is exactly going on properly.
I love the fc_run-log
file! Here is the output with a fastq as input file. You could also try not having the input.fofn
file in the folder where you start the run, this also crashes cryptically. Note, I use job_type = local
for all runs.
input.fofn:
$ cat input.fofn
data/temp.fastq
stdout:
$ fc_run.py fc_run_ecoli.cfg
No target specified, assuming "assembly" as target
fasta2DB: Cannot open data/temp.fastq.fasta for 'r'
DBsplit: Cannot open ./raw_reads.db for 'r'
cat: raw_reads.db: No such file or directory
HPCdaligner: Cannot open ./raw_reads.db for 'r'
Exception in thread Thread-5:
Traceback (most recent call last):
File "/cluster/software/VERSIONS/python2-2.7.9/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/cluster/software/VERSIONS/python2-2.7.9/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 317, in __call__
runFlag = self._getRunFlag()
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 147, in _getRunFlag
runFlag = any( [ f(self.inputDataObjs, self.outputDataObjs, self.parameters) for f in self._compareFunctions] )
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 812, in timeStampCompare
if min(outputDataObjsTS) < max(inputDataObjsTS):
ValueError: max() arg is an empty sequence
Traceback (most recent call last):
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/bin/fc_run.py", line 4, in <module>
__import__('pkg_resources').run_script('falcon-kit==0.2.1', 'fc_run.py')
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 723, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1636, in run_script
exec(code, namespace, namespace)
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/falcon_kit-0.2.1-py2.7-linux-x86_64.egg/EGG-INFO/scripts/fc_run.py", line 643, in <module>
wf.refreshTargets(updateFreq = wait_time) # larger number better for more jobs
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 519, in refreshTargets
rtn = self._refreshTargets(objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
File "/node/work1/no_backup/lex/9-spine/bin/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 617, in _refreshTargets
assert self.jobStatusMap[str(URL)] in ("done", "continue", "fail")
AssertionError
fc_run.log:
$ cat fc_run.log
2015-05-15 14:33:47,501 - fc_run - INFO - fc_run started with configuration fc_run_ecoli.cfg
2015-05-15 14:33:48,143 - fc_run - INFO - executing /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads/prepare_db.sh locally, start job: build_rdb-1c3c9478
2015-05-15 14:33:53,263 - fc_run - INFO - /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads/rdb_build_done generated. job: build_rdb-1c3c9478 finished.
Please also post the contents of /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads/prepare_db.sh
.
Also, which commit of DAZZ_DB are you using? Use
git submodule status
The basic problem is that fasta2DB expects FASTA, not FASTQ. However, I'd like to see where the .fasta
extension is appended.
@lexnederbragt will you be able to show some snippet of the fastq file? Any particular reason not starting with subreads.fasta files?
cat /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads/prepare_db.sh
source /node/work1/no_backup/lex/9-spine/bin/fc_env/bin/activate
cd /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads
hostname >> db_build.log
date >> db_build.log
for f in `cat /node/work1/no_backup/lex/9-spine/bin/ecoli_test/input.fofn`; do fasta2DB raw_reads $f; done >> db_build.log
DBsplit -x500 -s50 raw_reads
LB=$(cat raw_reads.db | awk '$1 == "blocks" {print $3}')
HPCdaligner -v -dal4 -t16 -e.70 -l1000 -s1000 -H12000 raw_reads 1-$LB > run_jobs.sh
touch /node/work1/no_backup/lex/9-spine/bin/ecoli_test/0-rawreads/rdb_build_done
git submodule status
-aea1a1dfbdac10a50a4bfbd81292842c7a7b4828 DALIGNER
-454ae5fe2ff4de6e03343480ae80f03f665d5992 DAZZ_DB
-23a0a9da3ab4584a59dafc35243987ff74a52b05 pypeFLOW
I collected all the subreads in a fastq file for PBcR/MHAP as it expects that (or at least can work with that), and the documentation asks for a single input file. So I wanted to reuse that file. If you want I can give you a snippet. Now I am happily correcting and assembling separate subread fasta files...
ValueError: max() arg is an empty sequence
That sometimes happens when a previous step fails, so the task-input calculation never occurs. pypeFLOW is not happy when it thinks there are no inputs to a task.
We should detect a bad input (e.g. fastq) and we should have a better message when inputs from the previous stage's outputs are missing. But we should also specify the full set of inputs and outputs for each task. At any rate, a task-failure now usually ends the run, so this is less of an issue. Please re-open if you see this in a more specific context, using code from the master branch.
Hi,
I accidentally provided a fastq file, which caused an error in the first step of fc_run.py. But this did not stop the run, it continued a few more steps before it died. It would be nice if fc_run.py does not try to continue after such an input file error.