Open JansthcirlU opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Author: | JansthcirlU |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.IO`, `untriaged` |
Milestone: | - |
Background and motivation
In the current implementation of the
SerialPort
class, the default value ofDataBits
(i.e. bits per byte) is set to 8, but the defaultEncoding
is set to ASCII, which is a 7-bit character encoding. Whenever theReadChar()
orReadExisting()
methods are called—respectively returning the next byte as a character or the next bytes as a whole string—the port assumes that the read bytes are of the ubiquitous 8-bit variation, but tries to encode them into 7-bit characters. As a result, any byte values greater than 127 yield invalid ASCII characters and are returned as?
s instead.According to this quote from the Wikipedia article on the Byte, serial communication allows for byte values that aren't necessarily always 8 bits in size, so it makes sense that there is a constructor parameter called
DataBits
in theSerialPort
class. However, there is no constructor parameter to set theEncoding
that corresponds with that data bit size. This means that consumers of theSerialPort
class may receive corrupted data because they're not aware that they have to set theEncoding
property in order to read out the default byte size correctly.I would like to suggest setting the default value of
SerialPort._encoding
toEncoding.Latin1
(orEncoding.GetEncoding(28591)
for backward compatibility), which is an 8-bit character encoding that matches the default value ofSerialPort.DefaultDataBits
.API Proposal
API Usage
Scenario (1) is the more traditional way to read out serial data, where the serial port's data stream is partially copied over to the provided byte array. Scenario (2) calls a method that does the same as (1) under the hood with the added step of encoding the received data to a string, which can then be decoded to get the original byte array again.
[8, 13, 21, 34, 55, 89, 144, 233]
[8, 13, 21, 34, 55, 89, 144, 233]
[8, 13, 21, 34, 55, 89, 63, 63]
[8, 13, 21, 34, 55, 89, 144, 233]
Regardless of which is more readable or useful or efficient, if the
SerialPort
class exposes these character-level and string-level interactions, then they should yield the same results as the more traditional byte array methods when reading the same data.Alternative Designs
I would also suggest adding an
Encoding encoding
parameter to theSerialPort
constructors whereint dataBits
is customisable, just to make sure that the character and string-level interactions yield the same results as the traditional byte-level interactions.Risks
The
Encoding.Latin1
property was only added in .NET 5, so to be backward compatible the encoding would have to be called using theEncoding.GetEncoding(int codepage)
method, which all .NET versions (including Framework) have in common. I've used a static field calledLatin1
for simplicity, but it might be safer to call theGetEncoding
method wherever I've writtenLatin1
.