cloudant-labs / cloudant-python

Asynchronous Cloudant / CouchDB interface for Python
http://cloudant-labs.github.io/cloudant-python/
37 stars 17 forks source link

Let db.changes() yield dict objects #33

Closed kxepal closed 10 years ago

kxepal commented 10 years ago

This change allows to process changes feed events without any additional magic work as iter_lines, parsing json, handling heartbeats - all these routines are wrapped inside changes() method.

Also, for continuous changes feed it doesn't holds anymore the latest change events in memory till request timeout or new change events.

kxepal commented 10 years ago

Rationale.

How db.changes method is supposed to be used:


params = {
    'params': {
        'feed': 'continuous',
        'heartbeat': 30000,
        'since': 0,
        'timeout': 10000
    }
}

changes = db.changes(**params)
assert changes.status_code == 200

for line in changes.iter_lines():
    if not line:  # skip heartbeats
        continue
    line = line.decode('utf-8')
    print(line)

That's cool, but these are a lot of useless code. Yes, changes feed has line-by-line output and I wanted to iterate over all these lines. Yes, changes feed carries JSON data and I need to decode it to Python objects to work with. Yes, I'd like to skip heartbeats since they aren't interesting to me, but are need to keep connection alive. + decoding bits, since for Python 3.x response contains binary string, while stdlib json decodes only unicode strings.

This PR makes code above looks more simple, intuitive and Pythonic:


params = {
    'params': {
        'feed': 'continuous',
        'heartbeat': 30000,
        'since': 0,
        'timeout': 10000
    }
}

for obj in db.changes(**params):
    process(obj)  # do work with decoded JSON

Also, due to default chunk size 512 bytes current implementation of naive processing changes feed doesn't emits all the events since there couldn't be \n character to split the feed. PR fixes that issue too, so you don't have to worry about unprocessed events.

mgmarino commented 10 years ago

Looks good, my main comment is that there should at least be the option to pass the heartbeat on instead of continuing internally and effectively blocking. Some user code may use these heartbeats to check variables, cancel the feed, etc.

kxepal commented 10 years ago

there should at least be the option to pass the heartbeat on instead of continuing internally and effectively blocking

I see the case when proxying heartbeats could make a sense. I this case which value should represent Heartbeat object? I feel that None is quite good candidate for.

mgmarino commented 10 years ago

You mean that obj in the iteration would be None when it is a heartbeat? Yeah, I think that's good and is consistent with the idea that no JSON object has been returned in the heartbeat.

This behavior is then like a blocking socket with a timeout, where the user needs to check if data was really returned.

garbados commented 10 years ago

Looks good to me! Resolve this TODO and I'll merge :D

kxepal commented 10 years ago

@mgmarino done!

@garbados I removed TODO notice. Actually, I cannot reuse Index.iter code since it skips empty lines (which I have to use as heartbeat detector) and also I have to set custom chunk_size on iter_lines call for continuous changes feed.

garbados commented 10 years ago

Sounds good. Merging...