dart-lang / native

Dart packages related to FFI and native assets bundling.
BSD 3-Clause "New" or "Revised" License
155 stars 43 forks source link

[jnigen][ffigen] Standardize filtering policy #1678

Open liamappelbe opened 2 weeks ago

liamappelbe commented 2 weeks ago

In ffigen, when an API element is included by the config, all its transitive deps are also included. Eg, if the user asks for a class to be generated (and doesn't filter the methods), we generate bindings for all the methods and all the classes consumed or returned by those methods.

In jnigen, transitive deps are not included by default. Instead, if a method refers to a class that hasn't been explicitly requested by the user, it will be replaced by a JObject (or maybe a stub in future).

We should settle on one approach. I'm about to refactor ffigen's filtering, so now is a good time to think about it.

@stuartmorgan Any opinions about which has been easier?

stuartmorgan commented 2 weeks ago

I've been giving this some thought over the past couple of weeks; I don't think either is the ideal model, but my ideal would be much closer to jnigen's

ffigen's transitive behavior creates truly enormous files, which caused me no end of problems:

jnigen's approach made files that felt reasonable to read/navigate as a human, and didn't cause any tooling problems. There are a couple of issues (some of which I've filed issues for):

What I think I'd like to see for both is something that has ffigen-style upward transitive inclusion, but jnigen-style pruning for everything else, but (and maybe this is what you mean by "a stub in future") instead of replacing non-included classes with JObject, replace them with a minimal but fully-typed class. I think it would be nice if we could indicate that in the name—e.g., a non-included Foo becomes Foo_Stub or something—to reduce potential confusion if someone then tries to use it, although of course there are potential collision problems if we alter only some class names. That class would be essentially empty, but I would want it to have all of the correct inheritance (creating more stubs up the hierarchy as necessary) so that it could always be passed around in the same ways the real object could be.

stuartmorgan commented 2 weeks ago

Jnigen's approach means more iterative rounds of adding things to the config and regenerating the bindings.

Something we could consider if this ends up being a usability problem is an optional mode to do current-ffigen-style exhaustive generation (although we still have the unnecessary-broken-generation problem to contend with there). I could imagine a workflow where you start with exhaustive generation once, write your Dart code, and then look at everything you used to generate a config for what you ended up needing and that being what you check in (and iterate on if/when you change the code). Incremental small additions would likely be easier than the initial process of finding all the things you need to start.

If we wanted to get really fancy, in theory it seems like we could build a tool that would analyze the code a client has written to see what it uses, and then automatically generate a tight-bound config file from it. That would be very cool, but not necessarily worth the eng effort depending on how complex it would be in practice.

(I will say that in practice, I had to do a lot of incremental generation with ffigen too, because I wasn't generating the entire SDK, so I still had to keep adding new root items as I found them. It was just a more painful iteration because of the file size. Maybe that's my mistake for using exclude-all-by-default: true, but I had enough issues with transitive includes without it that I didn't feel like I had much choice when dealing with a whole system SDK. I'd be interested to see what that experience is like once we've ensured that we can actually generate the entire iOS system API surface without errors.)

liamappelbe commented 2 weeks ago

What I think I'd like to see for both is something that has ffigen-style upward transitive inclusion, but jnigen-style pruning for everything else

SGTM!

@HosseinYousefi is this what you're planning to do in jnigen? We should try and make the behavior as similar as possible.

I think it would be nice if we could indicate that in the name—e.g., a non-included Foo becomes Foo_Stub or something—to reduce potential confusion if someone then tries to use it

Foo_Stub might make migration annoying, if the user later decides they want Foo. What if we just document it clearly in the bindings instead?

// <Foo's API documentation, if any>
class Foo extends objc.NSObject {
  // NOTE: This class is a stub. To generate its methods, add Foo to your config's includes.
}

Something we could consider if this ends up being a usability problem is an optional mode to do current-ffigen-style exhaustive generation (although we still have the unnecessary-broken-generation problem to contend with there).

Yeah, I'll definitely make this behavior a config option (for backwards compatibility, and because the current behavior works well for C bindings).

But one thing I want to do is make some helper functions for constructing configs for ffigen's Dart API. We're at a stage now with the config yaml where the defaults are tuned for C bindings, and for ObjC we have a whole bunch of recommended options. So it'd be good to have helper functions that generate configs for specific use cases. In that case I'd enable the new behavior by default in the ObjC helper.

HosseinYousefi commented 1 week ago
  • Most notably, I think including class A should fully bring in every class that A inherits from and interface that A implements (transitively), because it's extremely confusing to have methods missing from A unless I know the class structure.

That makes sense. I will address this.

  • I think JObject is a dangerous replacement for missing classes.

I'm with you, that's why I had opened #642.

If we wanted to get really fancy, in theory it seems like we could build a tool that would analyze the code a client has written to see what it uses, and then automatically generate a tight-bound config file from it. That would be very cool, but not necessarily worth the eng effort depending on how complex it would be in practice.

That's a fantastic idea and actually we need to do something similar to this to prune the native code, in case of jnigen, in form of generating proguard-rules: #681.

@HosseinYousefi is this what you're planning to do in jnigen? We should try and make the behavior as similar as possible.

Yep, sgtm!

stuartmorgan commented 1 week ago

Foo_Stub might make migration annoying, if the user later decides they want Foo.

If they have ever actually explicitly made something of type Foo_Stub it could be annoying, but I would argue (and we could mention in docs) that if you have enough explicit references to Foo_Stub that it would be annoying to change then you should be generating Foo. I would expect that the usual number of references to be zero; in the cases I hit, I did not realize when writing the code that I had a JObject, because I never had a reference to any of these objects. It was all inline calls. If I'd made a variable, I would have realized I needed Foo and added it to my config.

I'm not sure the use case of "I want to store a Foo just to pass it to other things later but I really don't want to generate Foo" is one we need to design for; I'd be curious what the use cases would be where generating Foo is actually undesirable.

What if we just document it clearly in the bindings instead?

// <Foo's API documentation, if any>
class Foo extends objc.NSObject {
  // NOTE: This class is a stub. To generate its methods, add Foo to your config's includes.
}

My question/concern there is whether people find that comment easily enough. In particular I'm thinking about someone trying to figure out why some methods are in the autocomplete list on an instance of Foo (because they are from a generated superclass), but others aren't.

One of my biggest usability concerns of the gen tools as compared to Pigeon is that with Pigeon, I explicitly write a simple interface definition, and the output is entirely predictable, so I'm not surprised when I go to interact with the generated code, whereas with gen there's a ton of complex output that can't be trivially predicted from the config I write, so I keep hitting surprises, and then trying to figure out if it's because:

Because of that, my default position is to err on the side of making stubs really obvious. Maybe I'm overly concerned about this and it would be fine with that comment though; we'd probably need to do some user studies to actually find out.

HosseinYousefi commented 1 week ago

Because of that, my default position is to err on the side of making stubs really obvious.

+1. Foo_Stub might be too disruptive. I would at least put it as a WARNING at the beginning of its doc comments:

/// WARNING: This class is a stub, to actually generate it run 
/// `dart run jnigen:add_class 'baz.bar.Foo' --config <the path to your jnigen.yaml>`.
///
/// ... the rest of the doc comment ...
// Maybe a `@stub` annotation that shows a lint as well?
class Foo extends JObject { /* ... */ }

And also add a add_class to both jnigen and ffigen which adds the class and reruns the generation to improve the user experience.

liamappelbe commented 1 week ago

The ObjC side of this is done.

@HosseinYousefi can we close this issue? Are you using this to track the feature, or do you have separate bugs for your side?

HosseinYousefi commented 1 week ago

The ObjC side of this is done.

@HosseinYousefi can we close this issue? Are you using this to track the feature, or do you have separate bugs for your side?

I'm using this to track it. (Since you had tagged both, I didn't create another issue)