Closed yfu closed 8 years ago
This happens when we run a script with a #!
immediately after modifying (or initially generating) that script. It's related to NFS caching.
I used to see it, but in our case I've found a way to force the cache to sync first. I don't see it anymore myself.
I haven't found a truly reliable work-around yet. If we run via bash script.sh
instead of ./script.sh
(with shebang), then top will list the executable as bash rather than as the script-name, so it's hard to know which script is which process-number.
Could you tell me the commits you are using?
cd FALCON-integrate
git rev-parse HEAD
cd FALCON
git rev-parse HEAD
Thanks for the reply and for formatting the log!
Here are the versions:
(fc_env)$ cd FALCON-integrate/
(fc_env)$ git rev-parse HEAD
cd9e9373a9f897bc429ecf820809c6d773ee5c44
(fc_env)fuy2@ummsres17:$ cd FALCON
(fc_env)fuy2@ummsres17:$ git rev-parse HEAD
5942bc00d509e1b530c986a41eb13dc87b8fe5bb
And I noticed that if I set -s
in pa_DBsplit_option
to a large value, this error does not appear but if I set it to a small value, the error will appear. Maybe it has something to do with the number of concurrent jobs?
Job concurrency will impact when NFS decides to sync. You're using NFS, right?
Another "solution" is to run in /tmp
, with job_type=local
. Then, filesystem caching is not an issue.
I can work on this tonight. If I come up with a possible solution, will you test it for us on your system?
Try the tip of master
for FALCON. If this works for you, I'll update FALCON-integrate. Otherwise, I'll revert.
Also, please try with avoid_text_file_busy=false
. It is possible that I've entirely fixed the problem for you. It should be quick and easy for you to test, and it would help me a bunch.
I pulled the latest commit by going to the FALCON dir and git pull origin master
, and I put the whole folder containing the data and config file in /tmp. When I ran the program, it still reported an error:
Traceback (most recent call last):
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 341, in __call__
return self.runInThisThread(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 362, in runInThisThread
self.run(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 276, in run
rtn = self._runTask(self, *argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 190, in _runTask
return self._taskFun(self)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 181, in task_build_rdb
support.build_rdb(**args)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/run_support.py", line 362, in build_rdb
bash.write_script_and_wrapper(script, script_fn, job_done)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/bash.py", line 47, in write_script_and_wrapper
make_executable(script)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/bash.py", line 17, in make_executable
mode = os.stat(path).st_mode
OSError: [Errno 2] No such file or directory: 'fasta2DB -v raw_reads -f/tmp/2016-01-05-falcon/0-rawreads/input.fofn\n\nLB=$(cat raw_reads.db | awk \'$1 == "blocks" {print $3}\')\nHPCdaligner -v -dal4 -t16 -h1 -e.70 -l1000 -s1000 -M8 -H500 raw_reads 22-$LB >| /tmp/2016-01-05-falcon/0-rawreads/run_jobs.sh\n'
In the end, I changed line 47 of src/py/bash.py
from make_executable(script)
to make_executable(os.path.join(wdir, sub_script_bfn))
to fix this but then a new error occurred:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/home/fuy2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/fuy2/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 341, in __call__
return self.runInThisThread(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 362, in runInThisThread
self.run(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 276, in run
rtn = self._runTask(self, *argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 190, in _runTask
return self._taskFun(self)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 183, in task_build_rdb
job_type=config['job_type'], sge_option=sge_option_da)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 144, in run_script_and_wait_and_rm_exit
job_type, sge_option)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 137, in run_script_and_wait
run_script(job_data, job_type)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 96, in run_script
rc = _run_script(job_data)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 66, in _run_script_local
system(cmd, check=True)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 23, in system
raise Exception(msg)
Exception: Call 'bash /tmp/2016-01-05-falcon/0-rawreads/prepare_rdb.sh 1> /tmp/2016-01-05-falcon/0-rawreads/prepare_rdb.sh.log 2>&1' returned 256.
And it seems that prepare_rdb.sh
is trying to repeatedly create a raw_reads.db
file and causes fasta2DB to complain.
+ fasta2DB -v raw_reads -f/tmp/2016-01-05-falcon/0-rawreads/input.fofn
fasta2DB: File pb.test.fasta is already in database raw_reads.db
It does not have this error when I was using the default FALCON in FALCON-integrate. So... could this be a bug in the latest FALCON?
I forgot to mention that NFS is installed on the machine but all the files (FALCON-integrate, data and config files) are in local storage. Not sure if this will impact the concurrency problem...
I was able to run FALCON-integrate successfully by editing bash.py
file in the default FALCON (5942bc0
) in FALCON-integrate: changing line 48 from time ./{sub_script_bfn}
to time bash ./{sub_script_bfn}
. So... this confirms your thought: it is the difference between running /a.sh
and bash ./a.sh
It is likely that the latest FALCON has problems with repeatedly running prepare_rdb.sh
.
Sorry. I didn't have time to test before going home last night, but I'm glad you saw the basic idea. Yes, running bash script
instead of ./script
always solves the problem, but that's not the solution I wanted, since it then makes "bash" the program name, which makes different processes hard to distinguish.
I had hoped you could try the program with chmod applied by the Python code rather than by the bash wrapper. My theory is that, even though the Python script closes the filehandle for the generated script (since open()
is in a with
block), NFS caching somehow causes the sub-process not to see its release. So when the wrapper script runs chmod
, the underlying script is still considered "open". But I cannot test this theory unless the problem is recurring repeatably. So I was really hoping you could test that.
Your problem with prepare_rdb.sh
is probably only because of the way the previous run failed. If you rm -rf 0-*/ 1-*/ 2-*/
, that should not happen.
Could you please try the latest HEAD with avoid_text_file_busy=false
. I'd like to make that the default.
I just tried the latest HEAD and I did delete the previous failed run, but the same error still appeared:
Traceback (most recent call last):
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 341, in __call__
return self.runInThisThread(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 362, in runInThisThread
self.run(*argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 276, in run
rtn = self._runTask(self, *argv, **kwargv)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/task.py", line 190, in _runTask
return self._taskFun(self)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 183, in task_build_rdb
job_type=config['job_type'], sge_option=sge_option_da)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 144, in run_script_and_wait_and_rm_exit
job_type, sge_option)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 137, in run_script_and_wait
run_script(job_data, job_type)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 96, in run_script
rc = _run_script(job_data)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 66, in _run_script_local
system(cmd, check=True)
File "/data/fuy2/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py", line 23, in system
raise Exception(msg)
Exception: Call 'bash /tmp/2016-01-05-falcon/0-rawreads/prepare_rdb.sh 1> /tmp/2016-01-05-falcon/0-rawreads/prepare_rdb.sh.log 2>&1' returned 256.
I also tried the latest HEAD with avoid_text_file_busy=false
and the same error with prepare_rdb.sh happened...
Oops... It is actually a different error in 0-rawreads/prepare_rdb.sh.log
:
HPCdaligner -v -dal4 -t16 -h1 -e.70 -l1000 -s1000 -M8 -H500 raw_reads 22-$LB >| /tmp/2016-01-05-falcon/0-rawreads/run_jobs.sh
+ HPCdaligner -v -dal4 -t16 -h1 -e.70 -l1000 -s1000 -M8 -H500 raw_reads 22-
HPCdaligner: range 22-0 is empty or out of bounds
Maybe I set -s
option to be too small in pa_DBsplit_option
?
Hmm. Could you run DBstats 0-rawreads/raw_reads
? Also cat 0-rawreads/raw_reads.db
.
(Btw, I currently have a core-dump from DBstats, which I'm investigating... Ah, divide-by-zero. Unrelated bug.)
Maybe I did something wrong last time: I found a raw_reads.db
file in the same parent dir of 0-rawreads
, 1-preads_ovl
, 2-asm-falcon
. After I deleted raw_reads.db
, I tried again with avoid_text_file_busy=true
and now it works! You rock! :-)
However, with avoid_text_file_busy=false
, FALCON still has the same problem of bad interpreter: Text file busy
Here is the DBstats output when avoid_text_file_busy=false
:
$ DBstats 0-rawreads/raw_reads
Statistics for all wells of length 500 bases or more
4,897 reads out of 5,000 ( 97.9%)
43,349,865 base pairs out of 43,379,531 ( 99.9%)
8,852 average read length
6,226 standard deviation
Base composition: 0.319(A) 0.175(C) 0.185(G) 0.320(T)
Distribution of Read Lengths (Bin size = 1,000)
Bin: Count % Reads % Bases Average
35,000: 2 0.0 0.2 35737
34,000: 2 0.1 0.3 35155
33,000: 1 0.1 0.4 34738
32,000: 2 0.1 0.6 34130
31,000: 8 0.3 1.1 32753
30,000: 5 0.4 1.5 32252
29,000: 8 0.6 2.0 31482
28,000: 5 0.7 2.4 31035
27,000: 8 0.8 2.9 30322
26,000: 14 1.1 3.7 29341
25,000: 16 1.4 4.7 28454
24,000: 26 2.0 6.1 27360
23,000: 24 2.5 7.4 26576
22,000: 41 3.3 9.5 25542
21,000: 38 4.1 11.4 24755
20,000: 40 4.9 13.3 24043
19,000: 57 6.1 15.9 23170
18,000: 82 7.7 19.4 22140
17,000: 98 9.7 23.3 21184
16,000: 156 12.9 29.2 20017
15,000: 181 16.6 35.7 19009
14,000: 243 21.6 43.8 17966
13,000: 230 26.3 51.0 17170
12,000: 238 31.1 57.8 16442
11,000: 218 35.6 63.6 15821
10,000: 213 39.9 68.8 15241
9,000: 214 44.3 73.5 14676
8,000: 235 49.1 78.1 14071
7,000: 274 54.7 82.8 13396
6,000: 266 60.1 86.8 12773
5,000: 279 65.8 90.3 12143
4,000: 288 71.7 93.3 11515
3,000: 331 78.5 95.9 10820
2,000: 366 86.0 98.0 10094
1,000: 485 95.9 99.6 9201
0: 203 100.0 100.0 8852
Ok, I'll let true
be the default (sadly). Thanks for testing.
I have run fc_run.py with the same config file and the same input files several times, but every time, it returned the same error: "/bin/bash: bad interpreter: Text file busy". And it seems to be non-deterministic: jobs with different numbers giving the same error message.
I have attached part of the log below: