can't bencode normal utf8 string

trim21 commented 6 months ago

I'm using fastbencode 0.2 and it failed to encode normal dict:

import fastbencode

data = {
    "favoriteFruit": "banana"
}

encoded = fastbencode.bencode(data)

  File "fastbencode\_bencode_pyx.pyx", line 399, in fastbencode._bencode_pyx.bencode
  File "fastbencode\_bencode_pyx.pyx", line 385, in fastbencode._bencode_pyx.Encoder.process
  File "fastbencode\_bencode_pyx.pyx", line 364, in fastbencode._bencode_pyx.Encoder._encode_dict
TypeError: key in dict should be string

also

import fastbencode

data = {
    b"favoriteFruit": "banana"
}

encoded = fastbencode.bencode(data)

  File "fastbencode\_bencode_pyx.pyx", line 399, in fastbencode._bencode_pyx.bencode
  File "fastbencode\_bencode_pyx.pyx", line 385, in fastbencode._bencode_pyx.Encoder.process
  File "fastbencode\_bencode_pyx.pyx", line 366, in fastbencode._bencode_pyx.Encoder._encode_dict
  File "fastbencode\_bencode_pyx.pyx", line 391, in fastbencode._bencode_pyx.Encoder.process
TypeError: unsupported type 'banana'

jelmer commented 6 months ago

bencode uses bytestrings (https://en.wikipedia.org/wiki/Bencode), so if you want to use plain strings, you'll need to encode them first, e.g.:

import fastbencode

data = {
    b"favoriteFruit": b"banana"
}

encoded = fastbencode.bencode(data)

While we could support plain strings and automatically encode them to utf8 bytestrings when encoding, we would still decode them to bytestrings which would be a bit surprising.

trim21 commented 6 months ago

While we could support plain strings and automatically encode them to utf8 bytestrings when encoding, we would still decode them to bytestrings which would be a bit surprising.

I would argue that it's surprsing that str can't be encoded by default...

jelmer commented 6 months ago

The format supports only bytestrings,and doesn't describe how strings should be encoded (although utf8 would probably be a sensible default).

Maybe the right call here is to add bencode_utf8 / bdecode_utf8 calls that do encode/decode to strings using utf8.

trim21 commented 6 months ago

The format supports only bytestrings,and doesn't describe how strings should be encoded (although utf8 would probably be a sensible default).

Maybe the right call here is to add bencode_utf8 / bdecode_utf8 calls that do encode/decode to strings using utf8.

I think it's reasonable to encode str to utf8 bytes, but not to decode bencoded bytes to str.

breezy-team / fastbencode

can't bencode normal utf8 string #27