Fix packet sizes and build them with slices

Due to the recent work I did for splitting long elements (#89), I felt like I should explain the current problem regarding the way unMessage serializes its packets. I think it will be easier by demonstrating with the python interpreter.

Setup:

>>> from math import ceil
>>> 
>>> from twisted.internet import reactor
>>> 
>>> from unmessage.contact import Contact
>>> from unmessage.elements import Element
>>> from unmessage.peer import a2b, Peer
>>> 
>>> alice = Peer('alice', reactor)
>>> bob = Peer('bob', reactor)
>>> out_request = bob._create_request(Contact(alice.identity, alice.identity_keys.pub))
>>> in_request = alice._process_request(str(out_request.packet))
>>> conv_a = in_request.conversation

The plaintext of unMessage are the elements. These are objects which contain information of actions involved in a conversation, such as sending a message (MessageElement), performing authentication(AuthenticationElement), etc. In this example I am going to use the base class (Element), but it could be any other:

>>> e = Element('example')
>>> e.serialize()
'{"content": "example"}'

Partial elements are objects representing an element possibly split into multiple parts. These objects can create element packets (when sending an element) or be created from element packets (when receiving an element).

The sender uses from_element, passing the element that will be sent. When the max_len is omitted, a partial of a single part is created:

>>> partial = PartialElement.from_element(e)
>>> packets = partial.to_packets()
>>> packets
[ElementPacket(type_='elmt', id_='ynE=', part_num=0, part_total=1, payload='{"content": "example"}')]
>>> print str(packets[0])
elmt
ynE=
0
1
{"content": "example"}

With the ID, the receiver is able to know which element the part belongs to. With the number of the part, the receiver is able to group them in the right order. With the total, the receiver is able to identify when the partial element is complete and can finally become an element. With the type, the receiver is able to deserialize to the correct element class.

When passing the max_len, the element is split into parts that fit that length:

>>> partial = PartialElement.from_element(e, max_len=10)
>>> packets = partial.to_packets()
>>> for packet in packets:
...     print str(packet)
...     print
... 
elmt
Ce4=
0
4
{"cont

elmt
Ce4=
1
4
ent": 

elmt
Ce4=
2
4
"examp

elmt
Ce4=
3
4
le"}

>>> packets[0]
ElementPacket(type_='elmt', id_='Ce4=', part_num=0, part_total=4, payload='{"cont')

So far, no problems. Although simple, Element, PartialElement and ElementPacket seem to work well. With the element packet ready to be sent, it is encrypted with pyaxo's AxolotlConversation:

>>> plaintext = str(packets[0])
>>> len(plaintext)
20
>>> print plaintext
elmt
Ce4=
0
4
{"cont
>>> ciphertext = conv_a.axolotl.encrypt(plaintext)
>>> len(ciphertext)
140

Although pyaxo's overhead could be decreased, it is not the problem because it is just additional 120 bytes for any plaintext. The problem arises when they are encrypted and become a RegularPacket. As we did not have the packet format completely defined, the easiest way to serialize the regular packets was just encoding all of its parts to base64 and separate them with line breaks:

>>> encrypted_packet = conv_a._encrypt(packets[0])
>>> encrypted_packet
RegularPacket(iv='zIyS6FQP0I0=', iv_hash='/NJNYwRPSosgGO7ZXvttQwsPooAdljYySOZ7mywGMJk=', payload_hash='pgiptBOOcwLfggq65dEjLtLHDXTlr8dLy0bipgqmy3g=', handshake_key='', payload='KPKkymiouJHHZN1a93PNjm6Iibn6RNgpdRd2JQ5SPOQfs37XlLWuCG2LKLcLbi2uwfC9Mf6/ZOzzb4utcNPASid9BfMH+mbDFp9J/Ld/BQK8LSObFn5tRi01gEUu4ZvuiBb3bpFBg5LsEO+DIJKyvvjzFbpWIMvyl0G2rptj8/nJtlzX8IavtN+h6wQ=')
>>> len(str(encrypted_packet))
  292
>>> print str(encrypted_packet)
zIyS6FQP0I0=
/NJNYwRPSosgGO7ZXvttQwsPooAdljYySOZ7mywGMJk=
pgiptBOOcwLfggq65dEjLtLHDXTlr8dLy0bipgqmy3g=

KPKkymiouJHHZN1a93PNjm6Iibn6RNgpdRd2JQ5SPOQfs37XlLWuCG2LKLcLbi2uwfC9Mf6/ZOzzb4utcNPASid9BfMH+mbDFp9J/Ld/BQK8LSObFn5tRi01gEUu4ZvuiBb3bpFBg5LsEO+DIJKyvvjzFbpWIMvyl0G2rptj8/nJtlzX8IavtN+h6wQ=

>>> byte_lens = [len(a2b(encrypted_packet.iv)), len(a2b(encrypted_packet.iv_hash)), len(a2b(encrypted_packet.payload_hash)), len(a2b(encrypted_packet.handshake_key)), len(a2b(encrypted_packet.payload))]
>>> byte_lens
[8, 32, 32, 0, 140]
>>> sum(byte_lens)
212

The size of the regular packet in bytes is 212. Due to this "serialization", the final string that is sent becomes much longer:

>>> line_breaks = 4
>>> base64_lens = [len(encrypted_packet.iv), len(encrypted_packet.iv_hash), len(encrypted_packet.payload_hash), len(encrypted_packet.handshake_key), len(encrypted_packet.payload), line_breaks]
>>> base64_lens
[12, 44, 44, 0, 188, 4]
>>> sum(base64_lens)
292

It does not seem such a big deal by growing "just" 80 bytes. The problem is that the biggest portion of this overhead is the payload's, which is variable:

>>> ceil(len(ciphertext) / 3.) * 4
188.0

Any payload of any size will be 34% bigger and it becomes a real problem when dealing with with long packets (e.g., file transfer).

What we have to do is defining exactly what we want in this packet format and then transmit only bytes and slice each part accordingly. I think the packet format we currently have is alright, but we need to read more about other formats and see if we need to add/remove something. One thing we definetely need is adding a version number (#53). Finally, we define the fixed size that packets should have and pad them (#58).

Notes:

If we move forward with #87, we can remove the handshake key from the packet. It is empty once the conversation has been established but when Alice is first replying to accept the request, it is 72 bytes that we have to expect before the payload.

After we decide about the size (or sizes) of packets, it would be interesting if we set the maximum length of strings based on the manager of the netstring receiver.

AnemoneLabs / unmessage

Fix packet sizes and build them with slices #57