djc / couchdb-python

Python library for working with CouchDB
Other
202 stars 86 forks source link

Bug when used with multiprocessing #313

Open kevinjqiu opened 7 years ago

kevinjqiu commented 7 years ago

There appears to be a race condition and the bug exists in both Python 2.x and Python 3.x, although they're manifested differently.

Minimum code to reproduce the bug:

import couchdb
import multiprocessing
import multiprocessing.pool

server = couchdb.Server('http://COUCHDB_HOST:5984/')
try:
    database = server.create('test')
except:
    server.delete('test')
    database = server.create('test')

database.save({'_id': '1', 'type': 'dog', 'name': 'chase'})
database.save({'_id': '2', 'type': 'dog', 'name': 'rubble'})
database.save({'_id': '3', 'type': 'cat', 'name': 'kali'})

def query_id(id):
    return dict(database[id])

def main():
    pool = multiprocessing.pool.Pool(3)

    docs = pool.map(query_id, ['1', '2', '3'])
    print(docs)

if __name__ == '__main__':
    main()

Observation 1: When run on Python 2.x, the following error is encountered:

$ python bug.py 
Traceback (most recent call last):
  File "bug.py", line 54, in <module>
    main()
  File "bug.py", line 46, in main
    docs = pool.map(query_id, ['1', '2', '3'])
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
TypeError: 'ResponseBody' object is not iterable

Observation 2: When run on Python 3.x, the execution hangs, and when you 'Ctrl+C' to terminate the program, the following stack trace is printed:

[ ... ]
    headers=headers, **params)
  File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/home/kevin/src/couchdb-python/couchdb/http.py", line 593, in _request
    credentials=self.credentials)
  File "/usr/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/kevin/src/couchdb-python/couchdb/http.py", line 402, in request
    data = resp.read()
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.6/http/client.py", line 462, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.6/http/client.py", line 612, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
KeyboardInterrupt
KeyboardInterrupt

Observation 3 If I change the pool size to 1 (essentially serialize the GET operations), the bug does not exist. Same happens when I try to debug it with visual studio code (whose debugger practically blocks the execution of other processes), the code runs without issue.

Observation 4 If I run a proxy server in front of couchdb (e.g., haproxy), the code runs without issue.

djc commented 7 years ago

Have you thought about the possibility that there is a CouchDB bug, rather than a bug in CouchDB-Python? In particular, I think observation 4 (thanks for the detailed report!) suggests that the bug might not be in CouchDB-Python.

My other thought is that this might have to do with the connection pooling we're doing in couchdb.http.

One question I have is, when you run this test case for 100 times (or 10), does it fail every time? My expectation would be for it to be intermittent.

kevinjqiu commented 7 years ago

Hi @djc

Have you thought about the possibility that there is a CouchDB bug

On couchdb's end, the requests were carried out successfully. I can see in the couchdb logs there are three concurrent GET requests, all responded with 200 OK. Also, I can use the requests library to call the endpoints concurrently without issue. Those observations lead me to think it's some sort of race condition inside CouchDB-Python.

One question I have is, when you run this test case for 100 times (or 10), does it fail every time

Yes, it fails every single time.

I might be able to reduce the sample code even further to only use couchdb.http methods to reproduce the issue. Out of curiosity, why CouchDB-Python didn't use the stock httplib? Sorry I'm not too familiar with the genesis of this project. EDIT: I see you built ConnectionPool on top of httplib.

kevinjqiu commented 7 years ago

@djc A tentative fix for the race condition: https://github.com/djc/couchdb-python/pull/314