Assembly keeps crashing at 1-preads_ovl step

PacificBiosciences / FALCON

FALCON: experimental PacBio diploid assembler -- Out-of-date -- Please use a binary release: https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries

https://github.com/PacificBiosciences/FALCON_unzip/wiki/Binaries

Other

205 stars 103 forks source link

Assembly keeps crashing at 1-preads_ovl step #392

Open dcopetti opened 8 years ago

dcopetti commented 8 years ago

Hello,

I am running Falcon on a ~800 Mb genome with ~60x coverage. It keeps dying at the step 1 with the following error:

...
[INFO]'/gsfs1/rsgrps/rwing/tania/falcon_90_subset/1-preads_ovl/job_0d47/job_0d47_done.exit' found.
[INFO]_refreshTargets() finished with no thread running and no new job to submit
[ERROR]Any exception caught in RefreshTargets() indicates an unrecoverable error. Shutting down...
Traceback (most recent call last):
  File "/gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 522, in refreshTargets
    rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
  File "/gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 747, in _refreshTargets
    failedJobCount, succeededJobCount))
LateTaskFailureError: 'Counted a total of 817 failure(s) and 4921 success(es).'
['/gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/bin/fc_run.py', 'fc_run_six90.cfg']

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Please wait for all threads / processes to terminate !
! Also, maybe use 'ps' or 'qstat' to check all threads,!
! processes and/or jobs are terminated cleanly.        !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[WARNING]#tasks=5738, #alive=0
Traceback (most recent call last):
  File "/gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/bin/fc_run.py", line 4, in <module>

This happened already a few times, with same error but different number of failures and successes. Other runs were successful, the memory and settings for each step work with this dataset (even with more data). Is it just a matter of deleting folder 1 and restart the job with input_type = preads? I have seen that it works, but this way I just waste CPU hours. Thanks

pb-cdunn commented 8 years ago

Is it just a matter of deleting folder 1 and restart the job with input_type = preads?

That's one way. And you don't even need to change input_type; already finished jobs will be skipped. But you can also delete just the bad directory, 1-preads_ovl/job_0d47, and restart.

Remember to rm -rf mypwatcher/ before restarting. (I need to document that, or remove it by default.)

To learn more about the failure, look in 1-preads_ovl/job_0d47/pwatcher.dir/stderr. (That's a symlink into mypwatcher/, so look before you delete.)

dcopetti commented 8 years ago

Thanks, I will delete it. But according to the error, will I have other 816 folders to delete? LateTaskFailureError: 'Counted a total of 817 failure(s) and 4921 success(es).'

I never saw a mypwatcher folder yet, and I can't find it inside the assembly folder (find . -name "mypwatcher" gives nothing). Is it a matter of Falcon version? I could not find the version number, does this help you understand the version we have? It is the first line of the stderr [INFO]Queued 'task:///gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py/task_make_fofn_abs_raw' ... thanks.

pb-cdunn commented 8 years ago

I never saw a mypwatcher folder yet ... Is it a matter of Falcon version?

Probably. What version are you running?

site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg

I think that's a pretty old version.

I could not find the version number.

You're using git, so the commit SHA1 is unique.

cd FALCON-integrate; git rev-parse HEAD
cd FALCON; git rev-parse HEAD

dcopetti commented 8 years ago

With Matt we installed a newer version of Falcon (-NEW), but then went back to the one that was working: [dcopetti@service0 FALCON-integrate]$ git rev-parse HEAD cd9e9373a9f897bc429ecf820809c6d773ee5c44 and [dcopetti@service0 FALCON-integrate-NEW]$ git rev-parse HEAD 4e655db1d06d99301b1d75bf1c13efc94a8d66c5

Shall I ask to update the Falcon to the latest one?

In the meanwhile we will remove the folder that had the failure and complete the assembly. Thanks,

On 06/22/2016 12:14 AM, Christopher Dunn wrote:

I never saw a mypwatcher folder yet ... Is it a matter of Falcon
version?
Probably. What version are you running?
site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg
I think that's a pretty old version.
I could not find the version number.
You're using git, so the commit SHA1 is unique.

|cd FALCON-integrate; git rev-parse HEAD cd FALCON; git rev-parse HEAD |

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227662138, or mute the thread https://github.com/notifications/unsubscribe/ANM2Xk4IXS4wxKshfN93rr_6leuZw2ZMks5qOODPgaJpZM4I7Ks1.

Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5

1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu

pb-cdunn commented 8 years ago

With the latest FALCON-integrate, you can still run the older workflow engine by calling fc_run0. But I think fc_run will work for you. Simply update to 0.7.1.

mseetin commented 8 years ago

No, it won't work for them. They have that cap on job name length in PBS Pro that I wrote that hashing strategy for. I hadn't gotten around to submitting a pull request with it.

On Wed, Jun 22, 2016 at 11:32 AM, Christopher Dunn <notifications@github.com

wrote:

With the latest FALCON-integrate, you can still run the older workflow engine by calling fc_run0. But I think fc_run will work for you. Simply update to 0.7.1.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227835980, or mute the thread https://github.com/notifications/unsubscribe/AJvPPuE0MDoInmzI7agkJZKA5VXC07Nvks5qOX-jgaJpZM4I7Ks1 .

pb-cdunn commented 8 years ago

fc_run0 won't work because of a long job-name? Then how could the old one be any better? I need to see evidence of the problem.

The job-name should be plenty short in fc_run1 (which is the default fc_run in 0.7.1).

Every step is logged. You should be able to repeat the PBS command-line line yourself. (And post it, along with the SHA1 for that run.) And if that fails, you should be able to debug that independent of everything else. We cannot do much remotely.

dcopetti commented 8 years ago

Sorry but I got lost in this technical details: do I need to do something once Falcon is updated? Repeat the edits with the script that Matt made some time ago? Thanks,

On 06/22/2016 12:02 PM, Christopher Dunn wrote:

|fc_run0| won't work because of a long job-name? Then how could the old one be any better? I need to see evidence of the problem.

The job-name should be plenty short in |fc_run1| (which is the default fc_run in 0.7.1).

Every step is logged. You should be able to repeat the PBS command-line line yourself. (And post it, along with the SHA1 for that run.) And if that fails, you should be able to debug that independent of everything else. We cannot do much remotely.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227844685, or mute the thread https://github.com/notifications/unsubscribe/ANM2Xl05Zh4lRa1qgCn0gpTNfFwteRm5ks5qOYa5gaJpZM4I7Ks1.

Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5

1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu

pb-cdunn commented 8 years ago

Is that the only issue? The job-name is too long? If so, I can push out a quick fix for that.

But you need to prove it first. Use fc_run with the lastest code. Use a logging-configuration file so that you get debug logging. (If you need help with that, ask. It's standard Python, but I can help. Maybe we should google-chat.)

Then, when if fails, you should see the actual command-line to PBS. I need to see that. And you should have an error message from PBS somewhere. I need to see that too.

Or, give me a password so I can login to your system.