M0r13n / pyais

AIS message decoding and encoding in Python (AIVDM/AIVDO)
MIT License
176 stars 61 forks source link

Use splitlines when reading socket streams to handle all variations of line breaks #113

Closed jwheare closed 1 year ago

jwheare commented 1 year ago

Actually this didn't fix things for me. I got rid of all the partial stuff, and just split then yielded all lines and now it works:

    def read(self) -> Generator[bytes, None, None]:
        while True:
            body = self.recv()

            # Server closed connection
            if not body:
                return None

            # FIX
            lines = body.splitlines()
            yield from (line for line in lines if line)
M0r13n commented 1 year ago

Hey @jwheare,

do you know what was the underlying cause of your issue? The partial stuff is there to only yield complete messages, because messages are not guaranteed to end with the boundary of a buffer.

recv() returns up to N number of bytes. When the message has a length larger than N+1 bytes it is split across two calls of recv(). Thus the read() functions maintains a local buffer between subsequent calls of recv() of incomplete lines.

jwheare commented 1 year ago

I have a local AIS receiver that exposes the raw data on a tcp socket, and when listening to that socket, I regularly receive more than one AIS message per recv call, split by a single \n character. The original code here is only splitting on \r\n so it fails to split the lines correctly, hence switching to splitlines which handles all forms of line break.

But the initial change in this PR results in some lines being combined together with no line breaks, due to the last line of the section: partial = lines[-1] probably due to some assumptions changing about how the lines are split.

Since I'm not seeing any messages come in from recv that are incomplete, I chose to just remove the partial message handling as it seems to complicate matters more than anything and was tricky to debug. I'll leave it to you how you want to resolve this, but probably some tests would be helpful.

M0r13n commented 1 year ago

Hey @jwheare,

hence switching to splitlines which handles all forms of line break.

which sounds reasonable. But split() and splitlines() behave slightly different:

>>> x = 'Hello\nWorld\n'
>>> x.split('\n')
['Hello', 'World', '']
>>> x.splitlines()
['Hello', 'World']

I played around a bit and came up with the following approach:

partial: bytes = b''
        while True:
            body = self.recv()

            # Server closed connection
            if not body:
                return None

            lines = body.splitlines(keepends=True)

            # last incomplete line
            line = partial + lines[0]
            if line:
                yield line

            if lines[-1].endswith(b'\n'):
                # all lines are complete
                yield from lines[1:]
            else:
                # the last line was only partially received
                yield from lines[1:-1]
                partial = lines[-1]

This is essentially the same logic. But now splitlines is used to separate all lines and the last partial line is only buffered in the loop if the last line did not end with a line feed.

M0r13n commented 1 year ago

Would you like to try the changes in: https://github.com/M0r13n/pyais/pull/114/files?

I also added a few unittests for different edge cases: https://github.com/M0r13n/pyais/pull/114/files#diff-5194034fea1edec8e8df9dbef79f97e2d2067a8b8129442d96dad8e3d5616a65

Feel free to report missing edge cases

jwheare commented 1 year ago

The change in #114 looks like it's working well, thanks.

M0r13n commented 1 year ago

@jwheare I merged #114. The fix is available in version v2.5.3. This version is also available on PyPi now.

I appreciate your bug report and your time! Have a great day!

Leon

jwheare commented 1 year ago

Brilliant, thanks for the quick response and new release. It's working well.