Open dcopetti opened 8 years ago
Is it just a matter of deleting folder 1 and restart the job with input_type = preads?
That's one way. And you don't even need to change input_type
; already finished jobs will be skipped. But you can also delete just the bad directory, 1-preads_ovl/job_0d47
, and restart.
Remember to rm -rf mypwatcher/
before restarting. (I need to document that, or remove it by default.)
To learn more about the failure, look in 1-preads_ovl/job_0d47/pwatcher.dir/stderr
. (That's a symlink into mypwatcher/
, so look before you delete.)
Thanks, I will delete it. But according to the error, will I have other 816 folders to delete?
LateTaskFailureError: 'Counted a total of 817 failure(s) and 4921 success(es).'
I never saw a mypwatcher folder yet, and I can't find it inside the assembly folder (find . -name "mypwatcher" gives nothing). Is it a matter of Falcon version? I could not find the version number, does this help you understand the version we have? It is the first line of the stderr
[INFO]Queued 'task:///gsfs1/rsgrps/FALCON/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py/task_make_fofn_abs_raw' ...
thanks.
I never saw a mypwatcher folder yet ... Is it a matter of Falcon version?
Probably. What version are you running?
site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg
I think that's a pretty old version.
I could not find the version number.
You're using git, so the commit SHA1 is unique.
cd FALCON-integrate; git rev-parse HEAD
cd FALCON; git rev-parse HEAD
With Matt we installed a newer version of Falcon (-NEW), but then went back to the one that was working: [dcopetti@service0 FALCON-integrate]$ git rev-parse HEAD cd9e9373a9f897bc429ecf820809c6d773ee5c44 and [dcopetti@service0 FALCON-integrate-NEW]$ git rev-parse HEAD 4e655db1d06d99301b1d75bf1c13efc94a8d66c5
Shall I ask to update the Falcon to the latest one?
In the meanwhile we will remove the folder that had the failure and complete the assembly. Thanks,
On 06/22/2016 12:14 AM, Christopher Dunn wrote:
I never saw a mypwatcher folder yet ... Is it a matter of Falcon version?
Probably. What version are you running?
site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg
I think that's a pretty old version.
I could not find the version number.
You're using git, so the commit SHA1 is unique.
|cd FALCON-integrate; git rev-parse HEAD cd FALCON; git rev-parse HEAD |
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227662138, or mute the thread https://github.com/notifications/unsubscribe/ANM2Xk4IXS4wxKshfN93rr_6leuZw2ZMks5qOODPgaJpZM4I7Ks1.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
With the latest FALCON-integrate, you can still run the older workflow engine by calling fc_run0
. But I think fc_run
will work for you. Simply update to 0.7.1
.
No, it won't work for them. They have that cap on job name length in PBS Pro that I wrote that hashing strategy for. I hadn't gotten around to submitting a pull request with it.
On Wed, Jun 22, 2016 at 11:32 AM, Christopher Dunn <notifications@github.com
wrote:
With the latest FALCON-integrate, you can still run the older workflow engine by calling fc_run0. But I think fc_run will work for you. Simply update to 0.7.1.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227835980, or mute the thread https://github.com/notifications/unsubscribe/AJvPPuE0MDoInmzI7agkJZKA5VXC07Nvks5qOX-jgaJpZM4I7Ks1 .
fc_run0
won't work because of a long job-name? Then how could the old one be any better? I need to see evidence of the problem.
The job-name should be plenty short in fc_run1
(which is the default fc_run in 0.7.1).
Every step is logged. You should be able to repeat the PBS command-line line yourself. (And post it, along with the SHA1 for that run.) And if that fails, you should be able to debug that independent of everything else. We cannot do much remotely.
Sorry but I got lost in this technical details: do I need to do something once Falcon is updated? Repeat the edits with the script that Matt made some time ago? Thanks,
On 06/22/2016 12:02 PM, Christopher Dunn wrote:
|fc_run0| won't work because of a long job-name? Then how could the old one be any better? I need to see evidence of the problem.
The job-name should be plenty short in |fc_run1| (which is the default fc_run in 0.7.1).
Every step is logged. You should be able to repeat the PBS command-line line yourself. (And post it, along with the SHA1 for that run.) And if that fails, you should be able to debug that independent of everything else. We cannot do much remotely.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/392#issuecomment-227844685, or mute the thread https://github.com/notifications/unsubscribe/ANM2Xl05Zh4lRa1qgCn0gpTNfFwteRm5ks5qOYa5gaJpZM4I7Ks1.
Dario Copetti, PhD Research Associate | Arizona Genomics Institute University of Arizona | BIO5
1657 E. Helen St. Tucson, AZ 85721, USA www.genome.arizona.edu
Is that the only issue? The job-name is too long? If so, I can push out a quick fix for that.
But you need to prove it first. Use fc_run
with the lastest code. Use a logging-configuration file so that you get debug logging. (If you need help with that, ask. It's standard Python, but I can help. Maybe we should google-chat.)
Then, when if fails, you should see the actual command-line to PBS. I need to see that. And you should have an error message from PBS somewhere. I need to see that too.
Or, give me a password so I can login to your system.
Hello,
I am running Falcon on a ~800 Mb genome with ~60x coverage. It keeps dying at the step 1 with the following error:
This happened already a few times, with same error but different number of failures and successes. Other runs were successful, the memory and settings for each step work with this dataset (even with more data). Is it just a matter of deleting folder 1 and restart the job with input_type = preads? I have seen that it works, but this way I just waste CPU hours. Thanks