Incomplete FALCON runs (using PBS)

yingzhang121 commented 9 years ago

Hi, admin,

I have downloaded and installed the latest version (v0.3.0) Falcon integrated release. The test dry run seemed OK, so I followed the wiki instructions (https://github.com/PacificBiosciences/FALCON/wiki/Setup%3A-Running) and (https://github.com/PacificBiosciences/FALCON/wiki/Setup%3A-Complete-example) to try FALCON using the Ecoli sample data.

Before I started the Ecoli runs, I modified the cfg file to make it only run locally

==== modified part of the run.cfg ====

job_type = local
jobqueue = 
sge_option_da =
sge_option_la =
sge_option_pda =
sge_option_pla =
sge_option_fc =
sge_option_cns =

And the following is my pbs script:

#!/bin/bash -l

#PBS -l nodes=1:ppn=24,walltime=12:00:00
#PBS -m ae

cd $PBS_O_WORKDIR
export WORK=/soft/pacificbiosciences-falcon/FALCON-integrate
source $WORK/fc_env/bin/activate
module load python
fc_run.py fc_run_local.cfg

So far, I had tried 4 Falcon runs. None of them returned any error message, but none of them could be completed either.

The first run "finished" after running for 43 minutes, however, when I checked the results, it seemed the pipeline just finished the processing of raw reads (with 1-preads_ovl and 2-asm-falcon directories empty). The second run ran for 1 hour, and stopped after the daligner step, which means only the 2-asm-falcon directory was empty. After the two failed runs, I switched to another linux cluster for a test drive using the same sample data and scripts. However, the 3rd and 4th run couldn't be done within 24 hour wall time, When I checked the pbs standard error and standard output files, it seemed both runs were still in the stage of processing of raw reads when hit the wall time.

I don't think this is correct, especially when I tried to resume any of the failed runs (submitting the same pbs scripts for a second time). The first run actually continued to run for another 10 minutes or so, and I got the p_contig.fa and the a_contig.fa (even though the a_contig.fa is really small). But the 2nd, 3rd and 4th can't be "resumed".

Please advice what I can do now. I am also planning to look into the run.py code because I suspected that there are some error-check points in the scripts that cause the workflow to stop but don't print out the error message.

By the way, for the 4 trials, the fc_run.log and pypeflow.log files were always empty.

Best,

Ying

pb-jchin commented 9 years ago

Hi, Ying: We don't have PBS and the same configuration of your cluster so it is hard to know what exactly happened. I personally have not test the code on PBS. I am wondering if any other has been able to port the code to PBS and share with you.

yilunhuangyue commented 8 years ago

hey, I have tested falcon on PBS and met problems too.I tested it using my own data,_p0.1.subreads.fasta,_p0.2.subreads.fasta,and *p0.3.subreads.fasta, after finishing the pipe, 1-preads_ovl and 2-asm-falcon directories empty,0-rawreads is not,and prepare_rdb.sh.log seems ok. but the PBS stderr file show some information:

Traceback (most recent call last):
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/bin/fc_run.py", line 4, in <module>
    __import__('pkg_resources').run_script('falcon-kit==0.4.0', 'fc_run.py')
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 729, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1642, in run_script
    exec(code, namespace, namespace)
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/EGG-INFO/scripts/fc_run.py", line 5, in <module>
    main(*sys.argv)
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 566, in main
    main1(*argv)
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 447, in main1
    wf.refreshTargets(updateFreq = wait_time) # larger number better for more jobs, need to call to run jobs here or the # of concurrency is changed
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 531, in refreshTargets
    rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
  File "/home02/huangyue/software/falcon/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 706, in _refreshTargets
    raise TaskFailureError("Counted %d failures." %failedJobCount)
pypeflow.controller.TaskFailureError: 'Counted 1 failures.'

Do you have any idea?

By the way,fc_run.log and pypeflow.log files were empty too,and I think the output information was written in stderr and stdout, if you don't use PBS ,the fc_run.log is the same as stderr.

PacificBiosciences / FALCON

Incomplete FALCON runs (using PBS) #190