earl / beanstalkc

A simple beanstalkd client library for Python
Apache License 2.0
457 stars 115 forks source link

8-bit transparency under Python 3 #60

Open seveas opened 9 years ago

seveas commented 9 years ago

Pull request #57 adds python 3 compatibility, but always tries to decode job bodies. Given that arbitrary byte strings may be put in the body, this is suboptimal.

For other protocol messages, only ascii content is accepted, so encoding/decoding automatically is fine.

I propose the following:

For put:

And for reseve/peek etc. (_read_body):

If this design is acceptable, I'll rebase and amend the pull request to follow this design.

svisser commented 9 years ago

If encoding or decoding fails, should we raise Python's built-in exceptions or wrap them in a BeanstalkcException? Similarly, when encoding is None and a Unicode string is passed to put what exception would we raise?

I think wrapping them with a BeanstalkcException is a clean solution (similar to how socket problems are wrapped) but preserving exception information is only elegantly supported in Python 3.x:

raise SubclassBeanstalkcException from exc
seveas commented 9 years ago

@earl poke. Any comments?

johnchristopherjones commented 8 years ago

I think it might be incorrect to support string encoding/decoding for beanstalkc. Beanstalkd is a lot like socket; it has no real knowledge (as far as I know) about the contents of a job; it just stashes bytes. There's no real way (or need) for beanstalk to know the encoding of a string job, or even that the job is a string.

What if beanstalkc only speaks bytes? The user would have to explicitly encode/decode on put/reserve, but I'm not sure that's a significant burden. We're implicitly relying on us-ascii/utf-8 being the same thing as byte strings most of the time in the Python 2 interface. Acknowledging that explicitly seems like the way to go. That sidesteps the 8-bit byte transparency issue entirely. Adding automatic encoding/decoding for strings is a convenience that can be added later.