canonical / testflinger

https://testflinger.readthedocs.io/en/latest/
GNU General Public License v3.0
12 stars 20 forks source link

fix(serial): Write log file from TCP in binary format #341

Closed thp-canonical closed 2 months ago

thp-canonical commented 2 months ago

Description

By opening the serial log file in binary ("b") mode, we can write the binary data received from the TCP socket directly to the file without having to (try to) decode the content as UTF-8. This makes serial logging 8-bit clean.

This is especially important if a character in an otherwise valid UTF-8 string happens to cross the boundary of a 4096-byte read, resulting in lost data instead of a single, valid UTF-8 character in the file:

>>> ("x"*4096+"ü").encode("utf-8")[4096:].decode("utf-8")
'ü'
>>> ("x"*4095+"ü").encode("utf-8")[4096:].decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

The same is true for binary data (basically bytes with values 127-255) that can be passed through as-is without being interpreted as UTF-8, e.g:

>>> b"\xff".decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

And - as a side-effect of the previous one - log output encoded in non-UTF8 encodings will have their non-ASCII characters stripped:

>>> b = "Jö".encode("latin-1")
>>> b
b'J\xf6'
>>> b.decode("utf-8", "ignore")
'J'

In all 3 cases, since we use "ignore", bytes that are not a valid UTF-8 sequence will just be silently thrown away.

Resolved issues

This resolves issue with:

Documentation

No changes.

Web service API changes

No changes.

Tests

I have not tested this.

thp-canonical commented 2 months ago

Won't this cause problems when we later read that file to put into the json payload to send to the server? IIRC json (at least in python) doesn't like having binary data in the fields. At the very least, we'll need to be careful how we read the log and convert it or do something with it when we read it in agent/testflinger_agent/job.py

Even if you want to read the file later on, reading the file as a whole and treating it as UTF-8 (and ignoring or escaping/replacing special characters then) might have less opportunities for something going wrong there, this has now been implemented in https://github.com/canonical/testflinger/pull/341/commits/efd8a18e3a8c43977bbae633ae44066b0ab06820 (edited; Black) as part of this PR.

plars commented 2 months ago

@thp-canonical because of branch protections, you'll need to sign the commits before this can be merged.

thp-canonical commented 2 months ago

@thp-canonical because of branch protections, you'll need to sign the commits before this can be merged.

Whoops, sorry. Done now (and squashed the changes into a single commit as part of it).