Memcachedb fails after a certain number of requests

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Run appscale using memcachedb as the backend datastore
2. Generate a few 100 requests to a given application (e.g. guestbook)
3. After a certain amount of requests the application will start returning
this error on puts "InternalError: Put accepted 1 entities but returned 0
keys."

After some investigation I found that these errors are a result of the
appscale server throwing this error: 

Exception happened during processing of request from ('128.111.55.210', 46319)
Traceback (most recent call last):
  File "/usr/lib/python2.6/SocketServer.py", line 558, in
process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 320, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 615, in __init__
    self.handle()
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
    self.handle_one_request()
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 323, in handle_one_request
    method()
  File "/root/appscale/AppDB/appscale_server.py", line 847, in do_POST
  File "/root/appscale/AppDB/appscale_server.py", line 251, in run_query
  File "/root/appscale/AppDB/dhash_datastore.py", line 106, in get_table
  File "/root/appscale/AppDB/dhash_datastore.py", line 225, in get_keys
  File "/root/appscale/AppDB/memcachedb/py_memcachedb.py", line 52, in get
  File "/root/appscale/AppDB/memcachedb/memcachedb.py", line 730, in get
  File "/root/appscale/AppDB/memcachedb/memcachedb.py", line 315, in
_get_server
  File "/root/appscale/AppDB/memcachedb/memcachedb.py", line 914, in connect
  File "/root/appscale/AppDB/memcachedb/memcachedb.py", line 928, in
_get_socket
  File "/usr/lib/python2.6/socket.py", line 182, in __init__
    _sock = _realsocket(family, type, proto)
error: [Errno 24] Too many open files

It looks like something in memcachedb.py is not closing socket handles. I
tried lowering the _SOCKET_TIMEOUT config from 3 to 1 second but this
doesn't seem to help.

Original issue reported on code.google.com by jmkupfer...@gmail.com on 31 Mar 2010 at 8:10

GoogleCodeExporter commented 9 years ago

Since we have not seen this issue before (even under heavy load) and the
memcachedb.py file has not been changed it is unlikely that it is the culprit. 

It seems like this is being caused by the sqlalchemy connection pooling which 
was
recently introduced. Switching ownership over to Yoshi who is more familiar 
with it.

Original comment by jmkupfer...@gmail.com on 31 Mar 2010 at 8:37

GoogleCodeExporter commented 9 years ago

I changed py_memcachedb.py to return each connection.
And I also changed the other datastore as same.

Original comment by yoshi...@gmail.com on 6 Apr 2010 at 4:46

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

There is another problem in multi threading.
If we use memcachedb in multi thread, the following error has occurred.

Traceback (most recent call last):
  File "/usr/lib/python2.6/SocketServer.py", line 558, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 320, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 615, in __init__
    self.handle()
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
    self.handle_one_request()
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 323, in handle_one_request
    method()
  File "/root/appscale/AppDB/appscale_server.py", line 847, in do_POST
  File "/root/appscale/AppDB/appscale_server.py", line 251, in run_query
  File "/root/dev/jkupferman/AppDB/dhash_datastore.py", line 106, in get_table
  File "/root/dev/jkupferman/AppDB/dhash_datastore.py", line 225, in get_keys
  File "/root/appscale/AppDB/memcachedb/py_memcachedb.py", line 54, in get
  File "/root/appscale/AppDB/memcachedb/py_memcachedb.py", line 40, in __initConnection
  File "/var/lib/python-support/python2.6/sqlalchemy/pool.py", line 170, in connect
    agent = _ConnectionFairy(self)
  File "/var/lib/python-support/python2.6/sqlalchemy/pool.py", line 324, in __init__
    conn = self.connection = self._connection_record.get_connection()
  File "/var/lib/python-support/python2.6/sqlalchemy/pool.py", line 377, in __getattr__
    return getattr(self.connection, key)
AttributeError: 'Client' object has no attribute 'get_connection'

Original comment by yoshi...@gmail.com on 8 Apr 2010 at 11:23

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I changed the connection pooling method of sqlalchemy from do_return_conn to 
close.
do_return_conn seems non thread safe.

And also, I added table lock mechanism into dhash_datastore to avoid conflict 
of key
list. But it is using thread lock now, and we should use distributed lock if it 
is
needed.

Original comment by yoshi...@gmail.com on 9 Apr 2010 at 2:59

Changed state: Started

GoogleCodeExporter commented 9 years ago

Upgraded priority to critical since we need to resolve this for the current 
release.
Two main questions are pertinent here:

1) Why is a not-common path seen in the stack trace posted above?

  File "/root/dev/jkupferman/AppDB/dhash_datastore.py", line 106, in get_table
  File "/root/dev/jkupferman/AppDB/dhash_datastore.py", line 225, in get_keys

This is not a directory layout we normally use, so why is it showing up?

2) Why are you now locking tables? Please elaborate, as it doesn't seem 
necessary
with the description you've given above. If it's not pertinent to this bug, 
please
open a new one with the issue in question.

Original comment by shattere...@gmail.com on 9 Apr 2010 at 3:44

Added labels: Priority-Critical, Component-Persistence, Usability, Milestone-Release1.4
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

About 1), I sym-link /root/appscale to /root/dev/jkupferman so I quickly compare
behavior between my local branch and the upstream branch. Why it switches 
between the
sym-linked path and non-sym-linked path in the stack trace is a mystery. 
Regardless,
I'm confident that it is not the cause of this bug. 

About 2), I created a separate ticket for it (#189) since it is different from 
the
original bug.

Original comment by jmkupfer...@gmail.com on 9 Apr 2010 at 5:06

GoogleCodeExporter commented 9 years ago

Memcachedb is working correctly now in 200 requests and multiple thread request 
test.

Original comment by yoshi...@gmail.com on 5 May 2010 at 7:05

Changed state: Fixed

ckrintz / appscale

Memcachedb fails after a certain number of requests #183