problem with master branch Falcon

dgordon562 commented 9 years ago

These are problems I'm having with the downloaded master version (I haven't made any changes yet):

1) if one of the jobs failes, fc_run.py will crash. Is this expected behavior? The crash has a traceback of:

2015-07-13 17:40:24,834 - pypeflow.controller - ERROR - Any exception caught in RefreshTar\ gets() indicates an unrecoverable error. Shutting down...

2) I've only tried assemblying ecoli so far, but it fails after the ct_ jobs have completed. fc_run.py doesn't crash, but it stops using cpu time (all night) and doesn't move to the 1-preads daligner stage. If I restart it, it crashes within in a minute with the error above.

Digging into this a little, the problem (at this point) is:

fasta2DB -v preads -f/net/eichler/vol24/projects/whole_genome_assembly/nobackups/ecoli/1\ 50713/1-preads_ovl/input_preads.fofn Adding 'out.00001' ... File out.00001.fasta, Line 1: Pacbio header line format error

However, 1-preads_ovl/input_preads.fofn does not exist yet, so of course it will get this error.

3) What is supposed to create input_preads.fofn ? Any idea of why this might not be created?

pb-jchin commented 9 years ago

@dgordon562 , please check, https://github.com/PacificBiosciences/FALCON/wiki, you need the --output_dformat option.

dgordon562 commented 9 years ago

Thanks, Jason.

Could you also answer #1 (above), please? That's a serious issue for us.

pb-cdunn commented 9 years ago

1) If a job fails, what would you expect? Do you want to be able to restart, keeping successful results? We could prioritize that, if it's really helpful.

2) We should find a way to indicate the missing --output_dformat option. But I don't understand. The message seems to indicate that the file was in fact read. It would contain something like 0-rawreads/preads/out.00001.fasta, and probably more. And if the .fasta does not exist, I would not expect a message about a header; I would expect a failure to open the file. Maybe you are looking in the wrong directory?

3) If you configure logging.ini to record DEBUG logs for pypeflow, you can look at pypeflow.log for something like this:

2015-06-29 02:50:01,889 - pypeflow.controller - DEBUG -  Details: {'__class__.__name__': 'PypeThreadTaskBase',
 '_status': 'TaskInitialized',
 'inputDataObjs': {'cjob_1': PypeLocalFile('file://localhost/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/0-rawreads/preads/c_00001_done', '/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/0-rawreads/preads/c_00001_done')},
 'mutableDataObjs': {},
 'outputDataObjs': {'cns_done': PypeLocalFile('file://localhost/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/0-rawreads/cns_done', '/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/0-rawreads/cns_done'),
                    'pread_fofn': PypeLocalFile('file://localhost/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/1-preads_ovl/input_preads.fofn', '/lustre/hpcprod/cdunn/repo/gh/FALCON-integrate/FALCON-examples/run/synth0/1-preads_ovl/input_preads.fofn')},
 'parameters': {}}

That doesn't tell the name of the task, unfortunately, but it names the known inputs and outputs for some task. That file is created in run.py.

I highly recommend trying FALCON-integrate and FALCON-examples at least once, according to their wikis. We know that synth0 works because we re-run that integration test on every update of the submodules. It's a good reference, even if you plan to use your own integration system for real genomes.

dgordon562 commented 9 years ago

I finally got FALCON-integrate to succeed. It was very painful, and to save others that pain, I'm listing what was needed. You might want to put this on the wiki--not as what must be done, but just as a comment of what one user, who was having problems, finally got to work (I included such notes in consed documentation when different sites had various problems that I didn't have myself):

git clone git://github.com/PacificBiosciences/FALCON-integrate.git
cd FALCON-integrate
git checkout 0.3.0
make init
make virtualenv
make check

This is where things get complicated. Any deviation (for me) wouldn't work:

export PYTHONPATH=
module purge
module load modules modules-init modules-gs/prod modules-eichler/prod
module load git/latest
module load mpc/latest
module load mpfr/3.1.0
module load gmp/5.0.2
module load gcc/latest

source
/net/gs/vol1/home/dgordon/falcon150712/FALCON-integrate/fc_env/bin/activate

module load anaconda/2.1.0

source
/net/gs/vol1/home/dgordon/falcon150712/FALCON-integrate/fc_env/bin/activate

module load mpc/latest
module load mpfr/3.1.0
module load gmp/5.0.2
module load gcc/latest
export LD_LIBRARY_PATH=/net/gs/vol3/software/modules-sw/python/2.7.3/Linux/RHEL\
6/x86_64/lib:$LD_LIBRARY_PATH

cd FALCON-integrate/

make -C FALCON-make install

Regarding #1, I've modified run.py so it doesn't crash when a single job crashes so I'm good. Regarding #2, when I used the master branch suggested cfg file, the crash disappeared. I haven't bothered to track down exactly what made the difference. But I'm good now.

pb-cdunn commented 9 years ago

Python virtualenv is causing you problems. I'm not surprised, and I don't have a solution. You don't have to use 'virtualenv', but if you don't, then you need to figure out how to use the Python eggs from FALCON.

Did you source fc_env/bin/activate twice? There is odd formatting on the second source line that you posted.

re: (1), why is the job crashing? Is it restartable?

re: (2), could you post the diff between the 2 cfg files that you used? The bad and the good? That could be invaluable. And where did they come from? One might be out-of-date somewhere.

dgordon562 commented 9 years ago

Yes, I sourced activate twice. I don't know where the odd formatting came from.

In one case daligner crashed due to not enough RAM. I've also simulated crashes by killing daligner or fc_consensus.py But, as I said, I've modified run.py so run.py doesn't crash due to a single qsub'd job crashing.

Here is the diff (with commented and extraneous lines removed). Left is good, right is bad. This is just for your benefit--I don't need further help with this:

< pa_HPCdaligner_option =  -v -dal4 -t16 -e.70 -l1000 -s1000
---
> pa_HPCdaligner_option =  -v -dal128 -t16 -e.70 -l1000 -s1000
61,65c58,59
< pa_DBsplit_option = -x500 -s50
< ovlp_DBsplit_option = -x500 -s50
---
> pa_DBsplit_option = -x500 -s400
> ovlp_DBsplit_option = -x500 -s400

< falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --local_match\
_count_threshold 2 --max_n_read 200 --n_core 6 --output_dformat
---
> falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 4 --local_match\
_count_threshold 2 --max_n_read 200 --n_core 6

< overlap_filtering_setting = --max_diff 100 --max_cov 100 --min_cov 20 --bestn\
 10 --n_core 24
---
> overlap_filtering_setting = --max_diff 60 --max_cov 60 --min_cov 2

dgordon562 commented 9 years ago

(more odd formatting--this was just a text file)

pb-cdunn commented 9 years ago

< is good, > is bad?

dgordon562 commented 9 years ago

correct.

pb-cdunn commented 9 years ago

Thanks. That's all helpful info.

PacificBiosciences / FALCON

problem with master branch Falcon #137