dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.37k stars 4.75k forks source link

[API Proposal]: Set the default encoding of the serial port class to ISO/IEC 8859-1 to accommodate 8-bit bytes out of the box #66506

Open JansthcirlU opened 2 years ago

JansthcirlU commented 2 years ago

Background and motivation

In the current implementation of the SerialPort class, the default value of DataBits (i.e. bits per byte) is set to 8, but the default Encoding is set to ASCII, which is a 7-bit character encoding. Whenever the ReadChar() or ReadExisting() methods are called—respectively returning the next byte as a character or the next bytes as a whole string—the port assumes that the read bytes are of the ubiquitous 8-bit variation, but tries to encode them into 7-bit characters. As a result, any byte values greater than 127 yield invalid ASCII characters and are returned as ?s instead.

In data transmission systems, the byte is used as a contiguous sequence of bits in a serial data stream, representing the smallest distinguished unit of data. A transmission unit might additionally include start bits, stop bits, and parity bits, and thus its size may vary from seven to twelve bits to contain a single seven-bit ASCII code.

According to this quote from the Wikipedia article on the Byte, serial communication allows for byte values that aren't necessarily always 8 bits in size, so it makes sense that there is a constructor parameter called DataBits in the SerialPort class. However, there is no constructor parameter to set the Encoding that corresponds with that data bit size. This means that consumers of the SerialPort class may receive corrupted data because they're not aware that they have to set the Encoding property in order to read out the default byte size correctly.

I would like to suggest setting the default value of SerialPort._encoding to Encoding.Latin1 (or Encoding.GetEncoding(28591) for backward compatibility), which is an 8-bit character encoding that matches the default value of SerialPort.DefaultDataBits.

API Proposal

using Microsoft.Win32;
using System.ComponentModel;
using System.Diagnostics;
using System.Text;

namespace System.IO.Ports
{
    public partial class SerialPort : Component
    {
        ...
        // --------- members supporting exposed properties ------------*
        // Latin1 (ISO/IEC 8859-1) accommodates 8-bit bytes unlike ASCII
        private static Encoding Latin1 = Encoding.GetEncoding(28591);
        private Encoding _encoding = Latin1;
        private Decoder _decoder = Latin1.GetDecoder();
        private int _maxByteCountForSingleChar = Latin1.GetMaxByteCount(1);
        ...
    }
}

API Usage

SerialPort port = new SerialPort("COM1"); // Suggested defaults: 8-bit bytes and Latin1 encoding

// 1) Read bytes directly from the received serial data
byte[] receivedBytes = new byte[8];
int bytesRead = port.Read(receivedBytes, 0, receivedBytes.Length);

// 2) Read serial data string and convert to a byte array
string receivedData = port.ReadExisting();
byte[] receivedBytesFromData = port.Encoding.GetBytes(receivedData);

Scenario (1) is the more traditional way to read out serial data, where the serial port's data stream is partially copied over to the provided byte array. Scenario (2) calls a method that does the same as (1) under the hood with the added step of encoding the received data to a string, which can then be decoded to get the original byte array again.

Bytes read
Example data [8, 13, 21, 34, 55, 89, 144, 233]
Scenario (1) [8, 13, 21, 34, 55, 89, 144, 233]
Scenario (2) currently [8, 13, 21, 34, 55, 89, 63, 63]
Scenario (2) proposed [8, 13, 21, 34, 55, 89, 144, 233]

Regardless of which is more readable or useful or efficient, if the SerialPort class exposes these character-level and string-level interactions, then they should yield the same results as the more traditional byte array methods when reading the same data.

Alternative Designs

I would also suggest adding an Encoding encoding parameter to the SerialPort constructors where int dataBits is customisable, just to make sure that the character and string-level interactions yield the same results as the traditional byte-level interactions.

using Microsoft.Win32;
using System.ComponentModel;
using System.Diagnostics;
using System.Text;

namespace System.IO.Ports
{
    public partial class SerialPort : Component
    {
        // Latin1 (ISO/IEC 8859-1) accommodates 8-bit bytes unlike ASCII
        private static Encoding Latin1 = Encoding.GetEncoding(28591);
        ...
        // -------- SECTION: constructors -----------------*
        public SerialPort()
        {
            _dataReceivedHandler = new SerialDataReceivedEventHandler(CatchReceivedEvents);
        }

        public SerialPort(IContainer container) : this()
        {
            // Required for Windows.Forms Class Composition Designer support
            container.Add(this);
        }

        // Non-design SerialPort constructors here chain, using default values for members left unspecified by parameters
        // Note: Calling SerialPort() does not open a port connection but merely instantiates an object.
        //     : A connection must be made using SerialPort's Open() method.

        // Added Latin1 as the default parameter value of encoding in existing constructors
        public SerialPort(string portName)
            : this(portName, DefaultBaudRate, DefaultParity, DefaultDataBits, DefaultStopBits, Latin1)
        {
        }

        public SerialPort(string portName, int baudRate)
            : this(portName, baudRate, DefaultParity, DefaultDataBits, DefaultStopBits, Latin1)
        {
        }

        public SerialPort(string portName, int baudRate, Parity parity)
            : this(portName, baudRate, parity, DefaultDataBits, DefaultStopBits, Latin1)
        {
        }

        public SerialPort(string portName, int baudRate, Parity parity, int dataBits)
            : this(portName, baudRate, parity, dataBits, DefaultStopBits, Latin1)
        {
        }

        // all the magic happens in the call to the instance's .Open() method.
        // Internally, the SerialStream constructor opens the file handle, sets the device
        // control block and associated Win32 structures, and begins the event-watching cycle.
        public SerialPort(string portName, int baudRate, Parity parity, int dataBits, StopBits stopBits)
            : this(portName, baudRate, parity, dataBits, stopBits, Latin1)
        {
        }

        // Newly added constructor that includes a parameter for the encoding
        public SerialPort(string portName, int baudRate, Parity parity, int dataBits, StopBits stopBits, Encoding encoding)
            : this()
        {
            PortName = portName;
            BaudRate = baudRate;
            Parity = parity;
            DataBits = dataBits;
            StopBits = stopBits;
            Encoding = encoding;
        }
        ...
    }
}

Risks

The Encoding.Latin1 property was only added in .NET 5, so to be backward compatible the encoding would have to be called using the Encoding.GetEncoding(int codepage) method, which all .NET versions (including Framework) have in common. I've used a static field called Latin1 for simplicity, but it might be safer to call the GetEncoding method wherever I've written Latin1.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation In the current implementation of the [`SerialPort` class](https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Ports/src/System/IO/Ports/SerialPort.cs), the default value of `DataBits` (i.e. bits per byte) is set to 8, but the default `Encoding` is set to ASCII, which is a 7-bit character encoding. Whenever the `ReadChar()` or `ReadExisting()` methods are called—respectively returning the next byte as a character or the next bytes as a whole string—the port assumes that the read bytes are of the ubiquitous 8-bit variation, but tries to encode them into 7-bit characters. As a result, any byte values greater than 127 yield invalid ASCII characters and are returned as `?`s instead. > In data transmission systems, the byte is used as a contiguous sequence of bits in a serial data stream, representing the smallest distinguished unit of data. A transmission unit might additionally include start bits, stop bits, and [parity bits](https://en.wikipedia.org/wiki/Parity_bit), and thus its size may vary from seven to twelve bits to contain a single seven-bit [ASCII](https://en.wikipedia.org/wiki/ASCII)[](https://en.wikipedia.org/wiki/Byte#cite_note-NWU-71) code. According to this quote from the [Wikipedia article on the *Byte*](https://en.wikipedia.org/wiki/Byte#Common_uses), serial communication allows for byte values that aren't necessarily always 8 bits in size, so it makes sense that there is a constructor parameter called `DataBits` in the `SerialPort` class. However, there is no constructor parameter to set the `Encoding` that corresponds with that data bit size. This means that consumers of the `SerialPort` class may receive corrupted data because they're not aware that they have to set the `Encoding` property in order to read out the default byte size correctly. I would like to suggest setting the default value of `SerialPort._encoding` to `Encoding.Latin1` (or `Encoding.GetEncoding(28591)` for backward compatibility), which is an 8-bit character encoding that matches the default value of `SerialPort.DefaultDataBits`. ### API Proposal ```C# using Microsoft.Win32; using System.ComponentModel; using System.Diagnostics; using System.Text; namespace System.IO.Ports { public partial class SerialPort : Component { ... // --------- members supporting exposed properties ------------* private static Encoding Latin1 = Encoding.GetEncoding(28591); // Latin1 (ISO/IEC 8859-1) accommodates 8-bit bytes unlike ASCII private Encoding _encoding = Latin1; private Decoder _decoder = Latin1.GetDecoder(); private int _maxByteCountForSingleChar = Latin1.GetMaxByteCount(1); ... } } ``` ### API Usage ```C# SerialPort port = new SerialPort("COM1"); // Suggested defaults: 8-bit bytes and Latin1 encoding // 1) Read bytes directly from the received serial data byte[] receivedBytes = new byte[8]; int bytesRead = port.Read(receivedBytes, 0, receivedBytes.Length); // 2) Read serial data string and convert to a byte array string receivedData = port.ReadExisting(); byte[] receivedBytesFromData = port.Encoding.GetBytes(receivedData); ``` Scenario (1) is the more traditional way to read out serial data, where the serial port's data stream is partially copied over to the provided byte array. Scenario (2) calls a method that does the same as (1) under the hood with the added step of encoding the received data to a string, which can then be decoded to get the original byte array again. | Example data | Scenario (1) | Scenario (2) | | --- | --- | --- | | `[8, 13, 21, 34, 55, 89, 144, 233]` | `[8, 13, 21, 34, 55, 89, 63, 63]` | `[8, 13, 21, 34, 55, 89, 144, 233]` | Regardless of which is more readable or useful or efficient, if the `SerialPort` class exposes these character-level and string-level interactions, then they should yield the same results as the more traditional byte array methods when reading the same data. ### Alternative Designs I would also suggest adding an `Encoding encoding` parameter to the `SerialPort` constructors where `int dataBits` is customisable, just to make sure that the character and string-level interactions yield the same results as the traditional byte-level interactions. ```csharp using Microsoft.Win32; using System.ComponentModel; using System.Diagnostics; using System.Text; namespace System.IO.Ports { public partial class SerialPort : Component { private static Encoding Latin1 = Encoding.GetEncoding(28591); // Latin1 (ISO/IEC 8859-1) accommodates 8-bit bytes unlike ASCII ... // -------- SECTION: constructors -----------------* public SerialPort() { _dataReceivedHandler = new SerialDataReceivedEventHandler(CatchReceivedEvents); } public SerialPort(IContainer container) : this() { // Required for Windows.Forms Class Composition Designer support container.Add(this); } // Non-design SerialPort constructors here chain, using default values for members left unspecified by parameters // Note: Calling SerialPort() does not open a port connection but merely instantiates an object. // : A connection must be made using SerialPort's Open() method. // Added Latin1 as the default parameter value of encoding in existing constructors public SerialPort(string portName) : this(portName, DefaultBaudRate, DefaultParity, DefaultDataBits, DefaultStopBits, Latin1) { } public SerialPort(string portName, int baudRate) : this(portName, baudRate, DefaultParity, DefaultDataBits, DefaultStopBits, Latin1) { } public SerialPort(string portName, int baudRate, Parity parity) : this(portName, baudRate, parity, DefaultDataBits, DefaultStopBits, Latin1) { } public SerialPort(string portName, int baudRate, Parity parity, int dataBits) : this(portName, baudRate, parity, dataBits, DefaultStopBits, Latin1) { } // all the magic happens in the call to the instance's .Open() method. // Internally, the SerialStream constructor opens the file handle, sets the device // control block and associated Win32 structures, and begins the event-watching cycle. public SerialPort(string portName, int baudRate, Parity parity, int dataBits, StopBits stopBits) : this(portName, baudRate, parity, dataBits, stopBits, Latin1) { } // Newly added constructor that includes a parameter for the encoding public SerialPort(string portName, int baudRate, Parity parity, int dataBits, StopBits stopBits, Encoding encoding) : this() { PortName = portName; BaudRate = baudRate; Parity = parity; DataBits = dataBits; StopBits = stopBits; Encoding = encoding; } ... } } ``` ### Risks The `Encoding.Latin1` property was only added in .NET 5, so to be backward compatible the encoding would have to be called using the `Encoding.GetEncoding(int codepage)` method, which all .NET versions (including Framework) have in common. I've used a static field called `Latin1` for simplicity, but it might be safer to call the `GetEncoding` method wherever I've written `Latin1`.
Author: JansthcirlU
Assignees: -
Labels: `api-suggestion`, `area-System.IO`, `untriaged`
Milestone: -