Ffigen transformer refactor

liamappelbe commented 5 months ago

As ffigen has developed we've gradually added more and more complexity to the addDependencies and toBindingString methods of the bindings classes. We should add a transformation step to the pipeline, before addDependencies, and move any logic that modifies the AST into this step.

It will probably be cleanest to have multiple separate transformations, rather than one big one. So we'll use a similar transformer pattern to the Dart CFE. This will also allow custom transformation steps in future. We'll probably also need to formalize/cleanup the AST representation a bit.

1: Clean up AST 2: Write transformer boilerplate 3 to N: Add a seperate transformer implementation for each of the AST modifying actions that currently happens in addDependencies and toBindingString. Could probably also port addDependencies to a transformer/visitor.

liamappelbe commented 5 months ago

Some notes on the transformer pattern.

liamappelbe commented 1 month ago

I was planning to let the transformer support AST node deletion, but I don't think it really makes sense in practice.

Most of the children of our AST nodes are not optional (eg fields in a struct or arguments in a function). So I found myself inserting loads of ! operators. To make that a bit less hacky I tried plumbing through a mustReturnNonNull flag to the transformation. But that doesn't really make sense either, because the children that are nullable are null or non-null based on decisions made by the parent (or the parent's transformer), not by the child's transformer. For example, the super type of an ObjCInterface is nullable, but it doesn't make sense for a transformation to delete the super type. An AST node's nullable child that is null before the transformation should be null after, and non-null before should be non-null after.

The only use case we have for node deletion is applying the user's filters to the AST. But we also bypass those filters for transitive dependencies. So the filtering isn't actually recursive. We apply the filters to the top level list of bindings, then everything transitively depended on by the bindings that pass the filters are included regardless of the filters. So the filters don't need any sort of recursive traversal of the tree, and would be better implemented outside of the transformer infrastructure.

liamappelbe commented 1 month ago

Ran into a fun DS&A problem. The transform is a DFS, so we need to track whether the nodes are visited.

T transform<T extends AstNode>(T node) {
  if (_seen.containsKey(node)) return _seen[node] as T;
  final result = node.transform(_transformation) as T;
  _seen[node] = result;
  return result;
}

Since there are cycles in the graph, we need to mark the node as visited before recursing into it (to avoid infinite loops). But the transformation is allowed to return a different node, so what do we put in the _seen map?

T transform<T extends AstNode>(T node) {
  if (_seen.containsKey(node)) return _seen[node] as T;
  _seen[node] = ???;
  final result = node.transform(_transformation) as T;
  _seen[node] = result;
  return result;
}

We could insert a sentinel value, but then anyone hitting the if (_seen...) case would receive the sentinel value instead of the true result. I thought about refactoring the transformer to reorder the node processing, but if the goal is to process all the children before processing the parent, that's just not possible in the presence of cycles. No matter what order we process the nodes, this transform method must sometimes return for the parent before returning for the child. So transformations are always going to need to be robust to that, and enforce any strict ordering they need by splitting into multiple transform stages. So there's not much point trying to reorder the node processing.

But we're still left with the issue of what do we actually return in the if (_seen...) case. If the parent returns before the child, we need some way of updating the parent's reference to the child, in the case that the child transform changed the node.

One fix is to force all cycles to go through a TypeRef<T> type. We'd make TypeRef's transform to always return the same TypeRef instance. Then the transform function would remember all the TypeRefs that point to a node, and update them all after the node's transform is complete. We'd essentially be separating the concepts of type reference and type declaration. This is a big change to the existing AST, but it's a sensible distinction, and on that libclang, jnigen, and swift2objc already make.
I don't really have a concrete use case for transformations that return a different node yet. All the transformations I have in mind just mutate existing nodes. So maybe we just drop this requirement, and switch to a visitor pattern? If we only have one or two use cases, it's actually possible to implement a transformation in a visitor, it's just a bit inelegant.

HosseinYousefi commented 1 month ago

We'd essentially be separating the concepts of type reference and type declaration. This is a big change to the existing AST, but it's a sensible distinction, and on that libclang, jnigen, and swift2objc already make.

This is a good refactor to do regardless of this issue.

2. I don't really have a concrete use case for transformations that return a different node yet. All the transformations I have in mind just mutate existing nodes. So maybe we just drop this requirement, and switch to a visitor pattern? If we only have one or two use cases, it's actually possible to implement a transformation in a visitor, it's just a bit inelegant.

I don't have a preference for how ffigen wants to internally do things, but if you want to expose these APIs to the user, I'm in favor of this option.

I prototyped this back when I was working on jnigen transformations as well – I'm not sure if you remember our discussions from like a year ago! We want to expose a very limited set of things that the user can do, and a mutable data structure + visitor allows exactly that.

HosseinYousefi commented 1 month ago

Since there are cycles in the graph, we need to mark the node as visited before recursing into it (to avoid infinite loops). But the transformation is allowed to return a different node, so what do we put in the _seen map?

For handling cycles you could "color" the node grey when you enter and black when you finish visiting. But maybe I don't understand the exact problem you're facing.

So in your case you'd have both a visiting and seen/visited sets.

liamappelbe commented 1 month ago

For handling cycles you could "color" the node grey when you enter and black when you finish visiting. But maybe I don't understand the exact problem you're facing.

Yeah, that'll let me detect the cycle and avoid infinite loops, and for visitors that's enough (actually for visitors I think I can just color the node black before I visit it). But if I'm going with a real transformer, where the transformation can produce a different node, it's not clear what the transform function above should return in the case where it arrives at a grey node.

HosseinYousefi commented 1 month ago

OK I read your comment again and got it now! It's 4 am here so my brain is not quite braining anymore. Makes sense, the AST we transform should be afterall a T!

liamappelbe commented 1 month ago

Remaining minor cleanups:

Remove Compound.compoundType
Remove/refactor bindingsIndex
Migrate library sort phase to a visitor, and remove the virtual sort method
Migrate warn private and renamer to a visitor (what about method renaming?)
Migrate package overrides to a visitor
Migrate isApiAvailable to a visitor

The important stuff is done, so I'll remove this issue from the v16 milestone.

dart-lang / native

Ffigen transformer refactor #1259