bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.46k stars 581 forks source link

Add inherited option to @Virtual #660

Closed HGuillemet closed 1 year ago

HGuillemet commented 1 year ago

B is a base class with a virtual method m. D is a derived class. If you want to override m in Java, you add a new Info("B").virtualize(), which main effect is to annotate the Java method m with @Virtual. The generator then creates a proxy C++ subclass for B, but also for D, so that you can create Java subclass of D that overrides m. Thus the ability to override in Java is "inherited" from B by D.

This PR adds the option to prevent this inheritance. The use of the proxy class that polls the JVM each time the C++ m is called has a cost that may not be negligible. So preventing this inheritance is useful for libraries that provide a parent virtual class, that can be subclassed, and concrete subclasses that are not meant to be subclassed.

This is the case of Pytorch with the Module class, usually subclassed by users, and numerous implementations for each layer type.

If Module is virtualized and the inheritance prevented with this PR, Pytorch JNI is 30986347 bytes.

If Module is virtualized with master, Pytorch JNI is 44111566 bytes (and doesn't compile, for an obscure template error I didn't dig further).

The PR adds an Info for preventing the inheritance and nothing changes in parsed file if this new Info is not set.

saudet commented 1 year ago

That seems fine, but the word "inherit" means the other way around, such that, for example, a user might think they need to provide Info.inherit(false) for all subclasses. I think I'll put "restrain" instead, which we could apply for other aspects that we don't want to get inherited. Sounds OK?

HGuillemet commented 1 year ago

What about "inheritable" ?

saudet commented 1 year ago

But it's not "inheritance" as understood for classes, so this is confusing. What about something like Info.virtualize().skipSubclasses() ? It's consistent with others like skipDefaults.

HGuillemet commented 1 year ago

I have no preference between skipSubclasses or inheritable. But this would be for the @Virtual optional argument. For Info, we need to target virtualization, not the whole cppName, so we need either something long like: virtualizeSubclasses or virtualizeInheritability or add an (optional) argument to virtualize(), in this case no need for a fancy name: virtualize(true /*enable*/, false /*skipSubclasses*/).

HGuillemet commented 1 year ago

Or something like virtualizeSingle as an alternative to virtualize

saudet commented 1 year ago

I don't think there's anything else than virtualize that modifies the behavior in subclasses, but if there were, we could reuse that flag, so that's fine for other potential future use cases.

HGuillemet commented 1 year ago

It doesn't sound rigorous to me. What if I want to skipSubclasses for virtualization but not for this future behavior ? Create 2 Info ? But in most places in the source we check the first Info attached to a cppName. Ok, this has little chance to happen, and the whole Info system may well be remastered in JavaCPP 2, but well...

I think adding an optional parameter to virtualize() is the most logical, since it will be translated into a @Virtual annotation with an optional parameter.

saudet commented 1 year ago

I'm starting to feel like you're trying to add unnecessary complexity. If the only use case you have for this is torch::nn::Module, it only has a dozen virtual functions. We can already do what you want to do with something like this:

String[] virtuals = {"clone", "train", "is_training", "to", "zero_grad", "save", "load", "pretty_print", "is_serializable", 
                     "_forward_has_default_args", "_forward_num_required_args", "_forward_populate_default_args"}; 
for (int i = 0; i < virtuals.length; i++) {
    virtuals[i] = "torch::nn::Module::" + virtuals[i];
}
infoMap.put(new Info(virtuals).annotations("@Virtual"));

Why isn't that satisfactory?

HGuillemet commented 1 year ago

Because virtualize does more than just annotating with @Virtual: it adds @Const and other more specific annotations, and prevents overloads due to optional parameters. So the JNI doesn't compile with your suggestion.

Yes, the current need is for Pytorch, but I try to generalize the solution so that it can benefit to other presets. The typical use case is explained in original post above. And the needed modifications in JavaCPP are quite small.

saudet commented 1 year ago

If those functions need their signatures adjusted, we can make Info.virtualize() work for functions too. Anything missing from that?

HGuillemet commented 1 year ago

A virtualized function will be "inherited". This inheritance mechanism is in Generator and works at the level of the function. Same remark holds for you suggestion above about adding @Virtual explicitly: it won't prevent the creation of proxy classes in subclasses (unless we add this inheritable/skipSubclasses option to @Virtual). But allowing Info.virtualize to target specific functions could be useful indeed.

saudet commented 1 year ago

I see, what we need to modify really is just the @Virtual annotation, so here's what we're going to do, assuming we name the new flag @Virtual(subclasses=...).

String[] virtuals = {"clone", "train", "is_training", "to", "zero_grad", "save", "load", "pretty_print", "is_serializable", 
                     "_forward_has_default_args", "_forward_num_required_args", "_forward_populate_default_args"}; 
for (int i = 0; i < virtuals.length; i++) {
    virtuals[i] = "torch::nn::Module::" + virtuals[i];
}
infoMap.put(new Info("torch::nn::Module").virtualize()).put(new Info(virtuals).annotations("@Virtual(subclasses=false)"));

And then in Parser.java, check whether @Virtual already appears in the annotations before trying to add it.

HGuillemet commented 1 year ago

That should work, but I wonder why you don't want to add a parameter to Info.virtualize to back up the subclasses option of @Virtual. Do you think virtualizing a single class and not the hierarchy is a such a corner case ?

saudet commented 1 year ago

Right, this is the first I hear about needing something like that.

HGuillemet commented 1 year ago

I added a line to allow virtualize to target functions. This simplifies a bit the code in Pytorch presets:

      String[] virtuals = {"clone", "train", "is_training", "to", "zero_grad", "save", "load", "pretty_print", "is_serializable"};
      for (int i = 0; i < virtuals.length; i++) {
        virtuals[i] = "torch::nn::Module::" + virtuals[i];
      }
      infoMap.put(new Info(virtuals).virtualize().annotations("@Virtual(subclasses=false)"));

and could be useful in other cases. This virtualizes a function if its class is not virtualized, but cannot disable virtualization of a function if the class is.

HGuillemet commented 1 year ago

Good to merge ?