MemoryError - Githubissues

cloudant / python-cloudant

A Python library for Cloudant and CouchDB

Apache License 2.0

163 stars 55 forks source link

if __name__ == "__main__": client = Cloudant(USERNAME, PASSWORD, account=USERNAME) client.connect() myDB = client[DB] csvFile = csv.writer(open("myDBData.csv", "wb+")) for i, document in enumerate(myDB): try: if(document["X"] != None): csvFile.writerow([document["X"]]) else: csvFile.writerow([""]) if (i+1) % 10000 == 0: print i+1 except: print document break

Traceback (most recent call last): File "myDBDataFetch.py", line 15, in <module> for i, document in enumerate(myDB): File "D:\Python2\lib\site-packages\cloudant\database.py", line 631, in __iter__ startkey=next_startkey File "D:\Python2\lib\site-packages\cloudant\database.py", line 389, in all_docs return resp.json() File "D:\Python2\lib\site-packages\requests\models.py", line 826, in json return complexjson.loads(self.text, **kwargs) File "D:\Python2\lib\site-packages\requests\models.py", line 791, in text content = str(self.content, encoding, errors='replace') MemoryError

Iterating over the database object myDB accomplishes two things:

It retrieves all of the documents from the remote database. (desired)
It also caches the retrieved documents in the local myDB cache. Remember that myDB is at its core a dict with some fancy functionality added in. (probably not desired for your case??)

While obviously retrieving your documents is the desired behavior here, I think that caching those documents locally is what is causing your eventual MemoryError.

My suggestion to you is to iterate over a Result object instead. Doing this provides you with similar behavior to bullet 1 with none of the side effects of bullet 2. There are two ways you can do this:

Via the database custom_result context manager, for example:

with myDB.custom_result(include_docs=True) as results:
    for result in results:
        ...

Via a Result object directly, for example:

    results = Result(myDB.all_docs, include_docs=True)
    for result in results:
        ...

These two approaches essentially do the same thing. Have a look at the custom_result and the Result to compare if you are interested in the specifics.

I hope that resolves your memory issue.

cloudant / python-cloudant

MemoryError #295