dart-lang / yaml

A Dart YAML parser.
https://pub.dev/packages/yaml
MIT License
167 stars 58 forks source link

Add type-safe "read" methods #40

Closed matanlurey closed 6 years ago

matanlurey commented 6 years ago

Very similar to https://github.com/dart-lang/args/issues/95.

Helps address

... and other, unfiled issues that we've hit trying to use package:yaml with Dart2 semantics.

We don't need full coverage of every possible type to start delivering lots of value. Here's my idea:

abstract class YamlMap {
  T read<T>(dynamic key);
  List<T> readList<T>(dynamic key);
  Map<K, V> readMap<K, V>(dynamic key);
}

Example simplified use:

void _parseMap(YamlMap config) {
  // Use <T>
  final name = config.read<String>('name');

  // OR, use inference:
  final String name = config.read('name');

  // Also works well with methods that already expect types
  _validateName(config.read('name'));
}

void _validateName(String name) { ... }

We could guard against trying to read collections, as well:

T read<T>(dynamic key) {
  if (T is Iterable || T is Map) {
    throw new UnsupportedError('Use readList or readMap instead');
  }
  ...
}

Here is an example implementation of readList:

List<T> readList<T>(dynamic key) {
  List<dynamic> list = this[key];
  return list.cast<T>();
}

One trickier case is readMap, for example, values could be a List<String>.

Two ideas:

  1. Special case:
Map<K, V> readMap<K, V>(dynamic key) {
  if (v is Iterable) {
    // Special case, and use readList here with .cast.
  }
}
  1. Add yet-another method:
Map<K, List<V>> readMapList<K, V>(dynamic key);

Thoughts? @srawlins @eseidelGoogle @nex3 @natebosch @jakemac53

jakemac53 commented 6 years ago

Pretty much all users of package:yaml will end up needing something like this to run under dart 2. I would prefer to have a shared implemention here then a smattering of implementations everywhere.

nex3 commented 6 years ago

I don't think this is inherently Dart 2-specific—it's always been important to have good type-checking for YAML files if you want to produce decent error messages. It's just that Dart 2 now reminds users more emphatically of that fact.

I'd really like to provide a general API for type-safe reading of untyped structured data. It's not just YAML where this comes up, but args (as Matan points out), JSON, and cross-isolate messaging. json_rpc_2's Parameters class is an implementation of this idea, but I think it could be generalized. Something like:

/// An untyped value that can be accessed in a typed way.
///
/// All typed accessors throw [FormatException]s if the underlying value isn't
/// of the appropriate type.
abstract class Value {
  bool get asBool;
  int get asInt;
  // ...
  ValueMap get asMap;
  ValueList get asList;

  /// Creates a [Value] that provides access to the given [object].
  factory Value(object);

  /// Asserts that this value is a [String] and passes that string to
  /// [callback].
  ///
  /// If [callback] throws a [FormatException], modifies that [FormatException]
  /// to indicate that it's associated with this value.
  T parse<T>(T callback(String value));
}

/// A [Value] that's known to be a map.
abstract class ValueMap implements Value, Map<Object, Object> {
  // Similar to [YamlMap.nodes].
  Map<Value, Value> get wrapped;

  /// Asserts that this has the given [key] and returns its value.
  ///
  /// Throws a [FormatException] if this doesn't have the given key.
  Value require(Object key);

  /// If this has the given [key], calls [callback] with its value and returns
  /// the result.
  ///
  /// Otherwise, calls [orElse] and returns the result.
  T valueOr<T>(Object key, T callback(Value value), T orElse());
}

This could live in its own package and provide mixins that would make it easy for packages like yaml and args to implement it and provide custom error handling (such as producing SpanFormatExceptions).

matanlurey commented 6 years ago

That seems pretty heavyweight compared to 3 methods (lots of wrappers, lots of code to write and test, the overhead of wrapping simple objects - like booleans - in a class), and also just more code to write. I want to write this (assuming config is YamlMap or similar, and log takes a String):

log(config.read('name'))

not

log(config.read('name').asString);

I don't want to try and block the perfect API that anyone wants to implement for there own packages, but as far as something that's both easy to read, write, and consume, and doesn't require creating yet-another shared package, I think the 3x read methods is superior.

nex3 commented 6 years ago

Those three methods are simple, but they're also difficult to extend, difficult to provide user-friendly errors for, and prone to incorrect typing due to their sensitivity to inference from the surrounding code. I suspect that in practice raw read() calls would be confusing and error-prone enough that a convention would evolve to always include the type parameter, in which case read<String>('name') doesn't provide value over ['name'].asString. It's not even clear to me that it's better than ['name'] as String if it doesn't improve the error message.

I've written a lot of code that parses untyped structured data, and I've always eventually found myself needing more advanced features like intelligent handling of sub-parsing and default values. If we leave those off now we'll just be addressing a third of the problem, and if we don't have a central package for this API to live in we won't have a way of building it out once we want to add more features.

matanlurey commented 6 years ago

Given our deadlines I don't think we need more features.

I suspect that in practice raw read() calls would be confusing and error-prone enough that a convention would evolve to always include the type parameter, in which case read<String>('name') doesn't provide value over ['name'].asString.

There's quite a bit of code where <T> is automatically filled in (assignments, passing to methods).

I've written a lot of code that parses untyped structured data, and I've always eventually found myself needing more advanced features like intelligent handling of sub-parsing and default values.

Sure. I'm not talking about that though, I'm talking about 3 methods to make Dart2 palatable.

natebosch commented 6 years ago

When I used Parameters from json_rpc_2 I found it to be one of the more frustrated bits of the API.

nex3 commented 6 years ago

Given our deadlines I don't think we need more features.

I'm fine with getting something simple out the door quickly, but it should be in a direction that will allow for more thorough work later. We don't have to add all the methods I outlined, but we should put what methods we do add in a package and interface that can be extended later.

There's quite a bit of code where <T> is automatically filled in (assignments, passing to methods).

There's also quite a bit of code where inferring this will lead to hidden errors that aren't detectable until runtime—if the method you're passing it to changes its argument type, read() will continue statically checking even if it's not compatible at all. That's exactly what the Dart 2 type system was meant to prevent.

@natebosch Can you go into more detail?

natebosch commented 6 years ago

Using Parameters mainly doesn't feel like any other Dart code I'm used to writing, including when I interact with other untyped stuff like jsonDecode.

I also don't see how the Parameter concept gets us any closer to solving the as List<String> issue... It changes parsed.readList<String>('key') into parsed['key'].asList<String>() I guess? Which also means that even in the cases where we don't need the noise we have the noise of parsed['key'].value

nex3 commented 6 years ago

You're right that Dart has built-in support for casting, but that support isn't customizable—it only ever throws CastErrors with no information about the value that produced the error. That's fine when you're sure out-of-band that your typing is correct—and in that case I agree that casting is a fine solution today.

But that's not the case for user-provided configuration like you get from YAML or for a protocol with well-defined error-handling behavior like JSON-RPC 2.0. In those situations, it's important for the errors to have the correct type and for them to be very comprehensible by the end user. To do that, you need domain-specific code to have a chance to learn that the type doesn't match and produce a nice error that says "this part of the YAML file was supposed to be a string" (or "this parameter of the RPC").

I don't want to use valueOr(something)because Dart already has a way to do that - ?? something. IIRC this one I can't avoid because I'm not allowed to read out the null, I'd have to catch an exception.

You could do parameters.asMap[key] ?? something, or assign the result of asMap to a variable like you'd assign jsonDecode(string) as Map. It's worth noting that YamlMap takes a different approach here, where the default operator [] returns dynamic and you have to opt in to the version wrapped with extra behavior. I'm not sure which is the better paradigm.

matanlurey commented 6 years ago

I think in general the consensus here unfortunately is not for Parameter-like behavior.

Given that we have very little time, and there isn't consensus, I'd like to push forward to adding the three read* methods, and we can always revisit this when time isn't a factor and the language team can also be more involved thinking of creative solutions (AFAIK, @leafpetersen and others are very motivated to figure out better patterns for JSON decoding post Dart-2).

nex3 commented 6 years ago

Three people chiming in on an issue isn't a consensus-gathering process. If you think it's worth spending the time trying for a real consensus, we can write up a doc that both of us agree lays out our positions accurately and send it out to the team for review and discussion. If you believe this needs to be addressed quickly, I've already proposed a compromise: create a small interface that implements as little as possible to make this work well going forward. But I'm not willing to allow the excuse of time constraints to push us into making decisions that aren't future-proof and accumulating more and more technical debt.

As Nate points out, Dart already has a built in mechanism for casting that works pretty well here if you're confident the input data is valid and so don't care about generating user-friendly errors. If you do care about generating nice errors, read<T>() is not a sufficient API.

natebosch commented 6 years ago

As Nate points out, Dart already has a built in mechanism for casting that works pretty well here

The reason this issue was opened is that these mechanisms stop working in Dart 2 runtime semantics for the Map and List case. as List<String> will fail, but we can make readList<String>(key) work.

The reason I brought up the Dart cast was that it does work for non-generics so the extra asString doesn't add value for me like readList<T>() would, but it imposes a syntax tax on every parameter.

matanlurey commented 6 years ago

Natalie, you can't hold this issue hostage because you disagree with the resolution.

I've individually talked to folks the use the YAML package most often

Sure, we can ask the entire of the rest of the team, but it's going to be similar to you asking the rest of the team about a SASS API, they aren't likely to have a lot of context (or desire to get that context).

I look forward to alternative proposals post-Dart2, but I'm planning on moving forward here.

natebosch commented 6 years ago

For more background on the motivation - here's an example of trying to get yaml parsing into typed collections working in Dart 2. This PR still doesn't solve everything but solves at least the first few issues I hit.

Note especially the awkward Map.map with a List.cast. https://github.com/dart-lang/build/pull/1295/files#diff-145acce6f4d7418297151f983356688dR257

nex3 commented 6 years ago

@matanlurey Let's see if we can find a solution that works for all the stakeholders—including pub, test, and json_rpc_2 which all have strong error-reporting needs—before we land something that we'll have to either maintain forever or roll back. This issue has only been open for a few hours, and I've only heard from one user. I'm confident we can find something that works well for all use-cases.

@natebosch Here's what I'd like to see you be able to write there:

return options.require(_buildExtensions).asMap.wrapped
    .map((key, value) => new MapEntry(key.asString, value.asList.ofStrings);

Also FYI, I don't think _actualDartType is necessary in that code today—loadYaml() returns normal Dart objects. Only loadYamlNode() will wrap scalar values in YamlScalars, and when you have a YamlMap you need to explicitly access YamlMap.nodes to see YamlScalars—other methods like .map() return normal Dart values. It's designed to act just like JSON decoding, but with the ability to opt in to extra metadata.

matanlurey commented 6 years ago

I don't see any of those packages as related, sorry. I've talked to the maintainers of several packages who are currently hitting the problem, got some feedback (entirely positive other than your objection, which is noted) and I am pushing for moving forward here.

Here is my first pass at completing this feature: https://github.com/dart-lang/yaml/pull/41.

joeconwaystk commented 6 years ago

Running into some of this as well. Some thoughts:

I prefer using a generic 'read' method. In an ideal world, type parameters to functions would be available at runtime and the type check is performed in the read. This is a common pattern in languages that support generics. If looking towards the future, I'd suspect that Dart will eventually have this ability.

T read<T>(dynamic key, {T onCastError(CastError e)}) {
  try {
    return this[key] as T;
  } on CastError catch (e) {  
    if (onCastError == null) rethrow;
    return onCastError(e);
  }
}

It's good practice, imo, to transition dynamic data structures like JSON and YAML into a concrete type where each key is a property of that type. Again in an ideal world, the type parameter is inferred to avoid specifying the type at the call-site:

class T implements YamlDecodable {
  String key;

  void decode(YamlMap map) {
    key = map.read('key');
  }
}

This then supports nested objects of a decodable type:

class U implements YamlDecodable {
  T nested;

  void decode(YamlMap map) {
    nested = map.readObject('nested');
  }
}

Using annotations + code generation seems like a good next step. My hunch is that a code generator would be much easier to write when using generic methods.

Here is a real world example of this. This could be further improved if generic methods could be overloaded, but it's not terrible. (FYI, Dart2 will force decode to be split into decode and decodeObject, if you read that far into it.)

(Additionally, asMap() is preferable to asMap because of the ability to override and provide optional arguments. asMap also looks like a member method reference to me and easy to mistake.)

leafpetersen commented 6 years ago

I've been thinking about ways to make the json decoding API work nicely with types, and I wonder if the same approach might work here. The idea I'd like to push on is providing a way to build up schema descriptions and use that either to convert untyped parsed json to a typed format (lazily or eagerly), or better to drive the parsing itself so that we could parse directly into a typed format. I think the resulting client API would look something like this:

import 'cast.dart' as cast;

const c = cast.Keyed<String, dynamic>({
  "id": cast.int,
  "fields": cast.Keyed({"field1": cast.int, "field2": cast.int})
});

void test(dynamic json) {
  // Correct reified type
  Map<String, dynamic> result = c.castJson(json);
  int id = result["id"];
  // Correct reified type
  Map<String, int> fields = result["field"];
}

void main() {
  var json = <String, dynamic>{
    "id": 0,
    "fields": <String, dynamic>{"field1": 1, "field2": 2}
  };

  test(json);
}

This seems like it might work well for yaml as well. Is it worth exploring this further in this context?

jakemac53 commented 6 years ago

I wonder how error reporting might work with an api like this @leafpetersen ? Especially with the nested casts - it is important to be able to report the full context of which field failed to parse and why.

matanlurey commented 6 years ago

There are definitely ~3 (or 4) different ways to handle YAML/JSON structured documents in Dart.

Let's take the following example from above:

{
  "id": 101,
  "fields": {
    "field1": 1,
    "field2": 2,
  }
}

Cowboy Style

... Or, have an intimate knowledge of Dart's type system and collections.

This is basically what is required today out of the box. You'd have to write:

void decodeJson(dynamic jsonResponse) {
  final map = jsonResponse as Map<String, dynamic>;
  final id = map['id'] as int;
  final fields = map['fields'] as Map<String, dynamic>;
  final field1 = fields['field1'] as int;
  ...  
}

... note, this is different for YAML (where there is no type guarantees, even for Map). If you get one of these explicit or implicit casts wrong, you'll get a CastError at runtime in strong-mode runtimes.

PROs:

CONs:

Assisted Style

... Or, my idea above, using read* methods (or similar) to assist with casts:

void decodeJson(dynamic jsonResponse) {
  final map = jsonResponse as Map;
  final id = readTyped<int>(map, 'id');
  final fields = readTypedMap<String, dynamic>(map, 'fields');
  final field1 = readTyped<int>(fields, 'field1');
  ...  
}

PROs:

CONs:

Structured casts

Or, @leafpetersen's suggestion above, so I won't copy and paste sample code.

PROs:

CONs:

Build system-based serialization

Or, @kevmoo's json_serializable or @davidmorgan's build_value, as examples.

PROs:

CONs:

Built-in language support

... i.e. something like value types or external types (@JS() might be an example today).

Unless this is planned and will ship in 6-12 months, it's probably not worth considering at all.

matanlurey commented 6 years ago

@leafpetersen:

This seems like it might work well for yaml as well. Is it worth exploring this further in this context?

See above. My worry with that suggestion is that while it does somewhat solve the issue of how to deserialize tightly structured unstructured data, and remove some of the need to become a Dart 2.0 PhD, you will still require at least a masters degree to handle more complex pieces of structured data.

It also doesn't provide any tooling or assists other than at runtime. So If I want to access document -> fields -> field1 -> name, I will need to do entirely through dynamic['fields']['field1']['name'] or insert casts and hope they are correct. You are basically describing the entire document schema, but all you get as a benefit is that Dart doesn't throw a CastError if used properly.

natebosch commented 6 years ago

Another thing that I'd like our solution to allow is handling flexible schemas.

For instance if I have a field which must end up being a List<String> I might want to allow my user to enter only a String to avoid the extra syntax noise. Compare:

targets:
  $defaults:
    sources: "lib/**"
targets:
  $default:
    sources:
      - "lib/**"

It's a minor difference, but one that I think makes the configuration feel more friendly.

I don't think any of the approaches discussed have explicit support for this, but some of them are likely to get in the way.

I'll add a con for the "cowboy style" which is solved by the others - it requires different solutions for different fields. It's safe to as int, but not safe to as List<int> so in those cases we need to use something like (field as List).cast<int> and it feels very inconsistent.

matanlurey commented 6 years ago

True that's probably the biggest concern. Two patterns for casting (at least if not more depending on the structure and type).

leafpetersen commented 6 years ago

For instance if I have a field which must end up being a List I might want to allow my user to enter only a String to avoid the extra syntax noise. Compare:

I don't think this is hard to support:

import 'cast.dart' as cast;

const c = cast.Keyed<String, dynamic>({
  "id": cast.int,
  "fields": cast.Keyed({"field1": cast.int, "field2": cast.int}),
  "extra1": cast.OneOf(cast.String, cast.List(cast.String)),
  "extra2": cast.OneOf(cast.String, cast.List(cast.String)),
});

void test(dynamic json) {
  // Correct reified type
  Map<String, dynamic> result = c.castJson(json);
  int id = result["id"];
  Map<String, int> fields = result["field"];
  String extra1 = result["extra1"];
  List<String> extra2 = result["extra2"];
}

void main() {
  var json = <String, dynamic>{
    "id": 0,
    "fields": <String, dynamic>{"field1": 1, "field2": 2},
    "extra1": "hello",
    "extra2": <dynamic>["hello"],
  };

  test(json);
}
leafpetersen commented 6 years ago

It also doesn't provide any tooling or assists other than at runtime. So If I want to access document -> fields -> field1 -> name, I will need to do entirely through dynamic['fields']['field1']['name'] or insert casts and hope they are correct.

To the extent that the result has a type expressible in Dart, this is not true: the result type of the cast will be as precise as possible. But it's true that absent union types, as precise as possible is not all that precise. :(

I think this approach could be made to support inserting user defined wrapper classes as you parse so that you could parse directly into a fully nominally typed API, but I haven't tried to write that API.

leafpetersen commented 6 years ago

I wonder how error reporting might work with an api like this @leafpetersen ? Especially with the nested casts - it is important to be able to report the full context of which field failed to parse and why.

It doesn't seem harder to me than error reporting in general, but I'm not sure if that's helpful or not. Certainly when decoding from already parsed json, you can give essentially a stack trace showing the path through the json object that led to the error (probably at some cost to performance). Not sure if that's good enough or not. If you're parsing directly from a string, I think it's basically the same, but haven't worked it through.

matanlurey commented 6 years ago

I talked to @leafpetersen offline, and his cast library seems to be a decent approach - specifically because implementations could offer optimized ways of creating these structures (helpful, especially for JSON - less so for YAML since there isn't a "native" parser).

@leafpetersen do you think this will still happen soon (TM), or should we have a stop-gap? I think it would be very reasonable to put something in package:convert for now, and if we want to add an optimized version in dart:convert in the future I wouldn't be opposed.

matanlurey commented 6 years ago

So this didn't end up happening and probably nobody was better for it. Oh well :-/