dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.57k forks source link

Add a universal encode method #56523

Open vicajilau opened 2 months ago

vicajilau commented 2 months ago

Description:

This proposal suggests adding a new encode method or way to return Uint8List to the Encoding class in Dart. This method would directly encode a String into a Uint8List, addressing the current inconsistency where different encodings produce different types of results, which complicates working with binary data.

Motivation:

In Dart, the Encoding class provides an encode(String) method that returns a List. However, utf8 overrides this method to return a Uint8List instead. The difference in return types creates inconsistency, as Uint8List is more suitable for handling binary data due to its optimized byte storage.

Here's the current flow when handling different encodings:

Uint8List encoded;
if (encoding == utf8) {
  encoded = utf8.encode(body); // utf8.encode returns Uint8List
} else {
  encoded = Uint8List.fromList(encoding.encode(body)); // Other encodings return List<int>
}

The need to convert List to Uint8List for encodings other than utf8 introduces unnecessary boilerplate code. Providing a unified way to encode strings directly into Uint8List across all encoding types would streamline this process and ensure consistency.

Example Implementation:

As a proof of concept, consider the following code, which introduces a new method named encodeToUint8List:

import 'dart:convert';
import 'dart:typed_data';

extension UniversalEncode on Encoding {
  Uint8List encodeToUint8List(String string) {
    return switch (this) {
      utf8 => utf8.encode(string), // Directly returns Uint8List
      _ => Uint8List.fromList(this.encode(string)), // Converts List<int> to Uint8List
    };
  }
}

This extension provides a consistent way to handle encoding results, though it is suggested as an example to illustrate the benefits of incorporating such functionality natively.

Proposed Approaches:

Adding a New Method to the Encoding Class: Introduce a new method named encodeToUint8List directly in the Encoding class. This method would handle the conversion internally for all encodings.

Example:

Copy code
class Encoding {
  Uint8List encodeToUint8List(String string) {
    return switch (this) {
      utf8 => utf8.encode(string),
      _ => Uint8List.fromList(this.encode(string)),
    };
  }
}

This approach ensures that all encodings return Uint8List consistently.

1. Overloading the Existing encode Method:

Explore the possibility of method overloading or parameterized options within encode to allow specifying the return type. This approach could add complexity and might not be ideal.

2. Creating a New Interface or Class:

Introduce a new BinaryEncoding interface extending Encoding, designed specifically for encodings that work directly with binary data, ensuring consistent Uint8List output.

Expected Usage:

With this feature integrated, developers would use the API as follows:

void main() {
  String text = "Hello, Dart!";

  // Using the proposed method with UTF-8
  Uint8List utf8Encoded = utf8.encodeToUint8List(text);
  print(utf8Encoded); // [72, 101, 108, 108, 111, 44, 32, 68, 97, 114, 116, 33]

  // Using the proposed method with Latin1
  Uint8List latin1Encoded = latin1.encodeToUint8List(text);
  print(latin1Encoded); // [72, 101, 108, 108, 111, 44, 32, 68, 97, 114, 116, 33]
}

Benefits:

Reduces boilerplate code by removing the need to manually convert List to Uint8List. Provides a consistent API across different encodings, improving usability and developer experience. Optimizes encoding operations for binary data handling.

Potential Drawbacks:

Introducing a new method could cause confusion with the existing encode(String) method. The current class hierarchy, where encode returns List, might limit implementation options.

Conclusion:

Adding an encodeToUint8List method to the Encoding class would improve the Dart language by providing a more consistent and developer-friendly approach to encoding strings, particularly when dealing with binary data. This change would address current inconsistencies and reduce unnecessary conversion code.

lrhn commented 2 months ago

Changing the original declaration to return Uint8List would be the best solution, but it's breaking. That will at least take a significant effort in missing all subclasses before being able to change the interface. Too bad.

Adding an extension method is the least intrusive approach. Just

   Uint8List encodeAsBytes(String value) =>
    switch (encode(values)) {
      Uint8List l => l,
      var l => Uint8List.fromList(l)
    };

should be enough.

Anyone can add that, it doesn't have to be on the platform libraries.

The one problem is that it assumes that the returned list contains bytes. If we had a utf16 codec, it could return a Uint16List instead, and putting that into a Uint8List one value at a time will lose half the bits. Luckily we say that an Encoding encodes as lists of bytes, even if we don't enforce it, so it should be defensible to assume byte values.

mraleph commented 2 months ago

Changing the original declaration to return Uint8List would be the best solution, but it's breaking.

I think it is worth doing this breaking change - I have reviewed subclasses of Encoding which can be found on GitHub: there are not that many and most are extremely ineffecient because they work with growable List<int> instances to represent result of encoding.

Switching Encoding to be Codec<String, Uint8List> would actually force people to rewrite their code in a more efficient way.

lrhn commented 2 months ago

A migration path would probably be:

It'll be some work to write the lint and get all the corner-cases right. Then it'll take some time to fix existing encodings. Dart 4.0 is probably the correct time for such a breaking change.