Open lamz138138 opened 7 years ago
I further "rm 0-rawreads/m && rm mypwatcher" and restart fcrun.py, now all new "0-rawreads/m job were failed for lacking of *las files.
That's because the rmfollow
lines in the generated bash scripts for merge will delete the original .las
files via symbolic link.
What is the right way to restart my job?
With the default pwatcher, just rm -rf mypwatcher
In fact, las files such as L1.71.1.las could be found in 0-rawreads/job_*, why it complain couldn't found such files?
I don't know yet. You'll have to step into a directory like 0-rawreads/m_00047
and investigate at the command-line in your shell.
Which version of FALCON-integrate are you using? Or which commits of FALCON and pypeFLOW?
Hi, thanks for your reply!
The version of FALCON-integrate is "a72f47c4600fb31c993151c1d03c1787ba42b161", FALCON is "7a6ac0d8e8492c64733a997d72a9359e1275bb57", and pypeFLOW is "f928267429f8cb456191b518a6d0de966b772ddc".
In fact, after "rm -rf mypwatcher" and restart fc_run.py with , it exit about a minute later. After checking the m_00047 in the tmp director, I think the job failed because it didn't make link for "L1.47.100.las" (I had use L1.71.1 for m_00090 previously). Relate information been updated above. Any suggestion would be grateful!
fc_run.py fc_run.cfg >fc_run.log 2>fc_run.err
grep "failed" fc_run.err
[ERROR]Task Node(0-rawreads/m_00047) failed with exit-code=256
[ERROR]Task Node(0-rawreads/m_00090) failed with exit-code=256
I've had many of these errors in the past.
LAmerge: Cannot open ./L1.47.100.las
check if L1.47.100.las exists. It should be there and be a link to one of the job_ directories. I'll bet it isn't there.
Then you need to try to figure out which job directory it should have pointed to. When you figure that out, then look in the job directory and see if L1.47.100.las exists. I'll bet it doesn't. (This is the way it always worked for me).
So you will need to re-run daligner on that job directory. If you don't want to do that manually, you can get Falcon to do it by deleting the done flag for that job directory. I think there may be another done flag to delete as well (which indicates all job directories have been processed). Chris can fill in the gaps in my instructions.
The reason this occurs is bad error-handling in daligner. If daligner can't, for example, write a file, it will not crash but rather continue to run. Thus Falcon will have no signal that daligner has failed and will write the done flag. I would love it if Gene/Chris could add some error handling in daligner so it will raise an error if it can't write a file or a line.
Hi!
My falcon was ran as "fc_run.py fc_run.cfg >fc_run.log 2>fc_run.err &". After it exit, I tried to find the reason by "grep 'failed' fc_run.err", then I found the two failed job: 0-rawreads/m_00047 and 0-rawreads/m_00090. After checking 0-rawreads/m_00047/pwatcher.dir/stderr, I found "LAmerge: Cannot open ./L1.47.100.las for 'r'", then I "rm 0-rawreads/m_00047/pwatcher.dir && rm 0-rawreads/m_00090/pwatcher.dir" and restart fc_run.py, but it exit with the same error. I further "rm 0-rawreads/m && rm mypwatcher" and restart fcrun.py, now all new "0-rawreads/m job were failed for lacking of *las files.
1) What is the right way to restart my job? I think I don't need to delete all m* jobs. 2) In fact, las files such as L1.47.100.las could be found in 0-rawreads/job*, why it complain couldn't found such files? 3) Suppose "use_tmpdir = /Project/Genome_Assembly/PacBio/Stage_1/tmp", then I found a lot of las files in /Project/Genome_Assembly/PacBio/Stage_1/tmp/0-rawreads/m_00047 except " L1.47.100.las", the 0-rawreads/m_00047/pwatcher.dir/stderr was as follow. By the way, the path of "../../../../" is right.
Any suggestion would be grateful!