dart-lang / sdk

The Dart SDK, including the VM, dart2js, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
9.94k stars 1.53k forks source link

[resource_identifiers] `UsageQueryable` proposal #55950

Open HosseinYousefi opened 3 weeks ago

HosseinYousefi commented 3 weeks ago

Problem As a package author, I don't really care about the way that the information is stored about the usage of different Dart entities. I only care about querying them. That is why @mkustermann initially proposed to change the output format from yaml to json. To use a relational database analogy, it does not matter what the schema is, because we don't write into the tables, we only query.

Current state With that said, there is a mismatch between what we write in the code, and what we write in the link hook to query. For example, the package author writes @RecordUse(arguments:true) or @RecordMethodCall or something like that. However, when it comes to actually querying this information, they will have to know additional APIs like fieldsForConstructionOf. This just makes it harder for package authors to know exactly what they can query, what annotations are necessary to add for which queries, etc.

It also makes the cost of change quite high. If a combination of these recording annotations can resolve a query, then any change can potentially be breaking.

Proposal Let's use the same object for specifying what we want to query in the code side, and the same object to query on the link hook side:

// Code side:

@UsageQueryable([
  MethodCallsQuery(includeConstArguments: true),
])
void foo(String str) {
  print(str);
}

// Link hook side:

void link(UsageInfo info) {
  // For simplicity, let's say we only specify methods by name.
  info.query(MethodCallsQuery(includeConstArguments: true), 'foo');
}

How the classes could be defined:

final class MethodCallsQuery implements UsageQuery<Iterable, String> { final bool includeConstArguments;

const MethodCallsQuery({required this.includeConstArguments});

@override Iterable extract(UsageInfo usageInfo, String name) { return usageInfo.callInfo.where((info) => info.name == name); } }


* A mock implementation of `UsageInfo`:
```dart
final class UsageInfo {
  final List<MethodCallInfo> callInfo;

  UsageInfo(this.callInfo);

  R query<R, P>(UsageQuery<R, P> usageQuery, P param) {
    return usageQuery.extract(this, param);
  }
}

final class MethodCallInfo {
  final String name;

  MethodCallInfo(this.name);
}

This makes it trivial to identify what the package author wants to query. We can change the underlying structure as long as we guarantee that we support the specified queries.

We can decide whether we statically check that we're passing the supported queries for each entity (for instance, a class does not support the MethodCallsQuery) or create multiple annotations for each entity.

Thanks to @dcharkes for the discussion today about this.

cc/ @mosuem cc/ @lrhn

mosuem commented 3 weeks ago

Maybe I don't understand correctly, but this seems to couple the annotation tightly with the format, making changes harder IMO. The current API proposal in https://dart-review.googlesource.com/c/sdk/+/369620 uses an annotation @RecordUse, a data class to store the information collected by the SDK, and a small querying API exposing only the information we want to expose.

We can change the underlying structure as long as we guarantee that we support the specified queries.

I believe the current design also achieves this - even more so, as we don't make any promises on what is collected. In the above example MethodCallsQuery is used both on the annotation and the query result. Currently, we expose only a subset of the method call information.

[...] they will have to know additional APIs like fieldsForConstructionOf [...]

This just makes it harder for package authors to know exactly what they can query, what annotations are necessary to add for which queries, etc.

I see that there is a difference between the annotations placed and the information retrieval API. I personally believe this makes it easier for use to evolve the format, and makes it easier for package authors to use. It is of course not very general purpose, but tailored to specific use cases, but then again we don't record the complete AST but just a use-case determined subset, so I don't think this is really an issue.

What use case do you have in mind for this design?

HosseinYousefi commented 3 weeks ago

Maybe I don't understand correctly, but this seems to couple the annotation tightly with the format

It has nothing to do with the format. We're not caring about how it has been stored, we care about what queries we can make. The way the results are extracted or the parameters needed can still be changed.

Let me explain why I think this design makes the annotations less coupled:

The only thing we guarantee is that the queries will have a result for the annotated entities. This way the package author can specify 5 different queries that they want to use on an entity, and we might only have to store 2 things. Later on, maybe due to some additional requirements we need to store 3 things to support the same 5 queries, as the implementation is hidden, this does not break user's code. Or maybe we can be more efficient later on and store only one thing, same story there.

We can't do the same currently. Let's say some query fooQuery works with two annotations of @RecordBar and @RecordBaz. If we require additional data for fooQuery, we either have to break people's code or try to make one of the existing annotations store other kind of data that may not be always necessary. Likewise for the opposite case of making things more efficient. Maybe it's now more efficient to annotate with @RecordFoo but our users are still using @RecordBar and @RecordBaz. fooQuery still support the previous way of retrieval, and it's hard to communicate the migration strategy to our users (we can't quite use deprecation, as @RecordBar and @RecordBaz themselves can be used in other circumstances, even in combination with one another, the only thing changed is for fooQuery specifically).

mosuem commented 3 weeks ago

Thanks, I understand better now.

Let's say some query fooQuery works with two annotations of @RecordBar and @RecordBaz

The current design only has a single @RecordUse annotation.

If we require additional data for fooQuery, we either have to break people's code or try to make one of the existing annotations store other kind of data that may not be always necessary.

I think if we want to support new queries which require new data, storing more data which is not always necessary is fine. If we want more fine-grained control, we can always add parameters to @RecordUse, such as arguments: true, fields: true, ....

I do like being able to change APIs more efficiently. But even with your proposed annotation design, a new query might need to be placed on a different object to make sure that one is recorded, so the migration strategy is not very straightforward either way. At least so far, the use cases are all very similar and clear-cut, and I also like the simplicity of having a single annotation which can be placed on all objects which should be recorded.

HosseinYousefi commented 3 weeks ago

The current design only has a single @RecordUse annotation.

The same concept applies with any fine grade control knobs – say named enum or boolean arguments, or any other way. This problem is not syntax-specific.

a new query might need to be placed on a different object to make sure that one is recorded

Migration is pretty straightforward. If a query did not exist before and now it does, then users will add it to A. their code and B. to their link hook. As an added benefit, the compiler has a holistic view on what exactly the user wants to query and can store just the right informations and nothing more (even if this is not so important right now).