Closed GoogleCodeExporter closed 8 years ago
Debugging now. Having troubles frequent key errors:
Exception in thread Thread-1:
Traceback (most recent call last):
File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threadin
g.py", line
460, in __bootstrap
self.run()
File
"/Users/alexander/inc/pygr_test/pygr/build/lib.macosx-10.5-i386-2.5/pygr/blast.p
y",
line 56, in run
for seqID, seq in self.seqDB.iteritems():
File "seqdb.py", line 299, in iteritems
yield seqID, self[seqID]
File "seqdb.py", line 314, in __getitem__
self._weakValueDict[seqID] = s # save in cache.
File "classutil.py", line 392, in __setitem__
self.keep_this(v)
File "classutil.py", line 389, in keep_this
del self._keepDict[vmin]
Is RecentValueDictionary thread safe?
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 12:51
Hi Alex,
I have no idea whether weak value dictionaries are thread safe, but more
importantly
I need to see the code you are running to generate this error message. Please
post
the script that generates this error. And what is the error class? I don't
see a
normal error class (e.g. "ValueError") in this message.
The tests in blast_test.py run fine with no threading problem as you reported,
so
what are you doing differently? Are you following that example code? Does
that test
(cd pygr/tests; python blast_test.py) run fine on your platform?
-- Chris
Original comment by cjlee...@gmail.com
on 31 Mar 2009 at 5:38
Code:
import blast,seqdb
seqDB = seqdb.BlastDB('1.TCA.454Reads.fna')
refDB = seqdb.BlastDB('pNL43.may13.fasta')
blastMap = blast.BlastMapping(refDB)
z=blastMap(None,al=None,queryDB=seqDB)
Where 1.TCA.454Reads.fna has LOTS of sequences and pNL43.may13.fasta has only
one.
Full error:
Exception in thread Thread-1:
Traceback (most recent call last):
File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threadin
g.py", line
460, in __bootstrap
self.run()
File
"/Users/alexander/inc/pygr_test/pygr/build/lib.macosx-10.5-i386-2.5/pygr/blast.p
y",
line 56, in run
for seqID, seq in self.seqDB.iteritems():
File "seqdb.py", line 299, in iteritems
yield seqID, self[seqID]
File "seqdb.py", line 314, in __getitem__
self._weakValueDict[seqID] = s # save in cache.
File "classutil.py", line 392, in __setitem__
self.keep_this(v)
File "classutil.py", line 389, in keep_this
del self._keepDict[vmin]
KeyError: 004294_0715_3536[0:94]
The exact sequence where the error occurs differs from run to run.
blast_test.py doesn't run with a strange error:
$ python blast_test.py
unable to import extension module: cannot import name cnestedlist
I presume that the main difference between blast_test.py and my test is that my
queryDB has many more sequences than the test.
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 6:33
I added
print threading.currentThread().getName(), repr(vmin) just before
del self._keepDict[vmin]
the result confirms that vmin is being deleted twice. Eg. the following two
lines are
adjacent in the output
...
Thread-1 003988_0557_2026[0:104]
MainThread 003988_0557_2026[0:104]
...
The KeyError follows...
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 6:41
Hi Alex,
I think you figured it out: RecentValueDictionary doesn't seem to be safe for
multiple threads. This also explains why the problem only arose with your
larger
dataset; it maintains a default cache size of 50 sequences. Unless you
process at
least 50 sequences, you won't encounter its cache-clearing mechanism.
Original comment by cjlee...@gmail.com
on 31 Mar 2009 at 4:09
Now this is really strange. I checkout the latest version from git, then run
python
setup.py build, but the tests won't run. Here's the error
%> python blast_test.py
unable to import extension module: cannot import name cnestedlist
Any ideas?
This is on Mac OS X 10.5.6.
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 6:47
UPDATE:
When I go to build/lib.macosx-10.5-i386-2.5/pygr directly and do
import cnestedlist
everything is fine.
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 6:49
UPDATE:
I was able to reproduce the error in the tests by making the test file even
longer.
Specifically, I replicated the first sequence of sp_all_hbb 1000 times with
different
ids. The errors occur while at id# 85--700.
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 7:11
try:
del self._keepDict[vmin]
except KeyError:
pass
Seems to do the job...
Original comment by alexande...@gmail.com
on 31 Mar 2009 at 8:01
Hi Alex,
just trapping the KeyError doesn't address the broader issue of thread-safety.
There
are all sorts of ways multiple threads could misbehave in any cache lookup /
updating
of RecentValueDict or other cache classes. We should either really make them
thread-safe with locking, or declare explicitly that Pygr 0.8 doesn't support
multi-threading. See my discussion group posting about this:
http://groups.google.com/group/pygr-dev/msg/9edcfd676b255668
-- Chris
Original comment by cjlee...@gmail.com
on 31 Mar 2009 at 8:46
this clearly needs more work, starting with a decision about whether we want
Pygr 0.8
to support multi-threading or not.
Original comment by cjlee...@gmail.com
on 31 Mar 2009 at 8:48
OK, I replaced threading with a solution that uses temporary files for
communication
to / from the subprocess. This eliminates all deadlock issues with no need for
threading. For details, see
http://groups.google.com/group/pygr-dev/msg/e0b13465171a6d6d
Alex, please test this on your multi-sequence blast example that was crashing
before...
Original comment by cjlee...@gmail.com
on 2 Apr 2009 at 7:07
Slower than the threading solution, but works without error.
Original comment by alexande...@gmail.com
on 2 Apr 2009 at 7:59
Original issue reported on code.google.com by
cjlee...@gmail.com
on 27 Mar 2009 at 4:51