bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.48k stars 581 forks source link

Improve parsing of operator and function templates #732

Closed HGuillemet closed 9 months ago

HGuillemet commented 9 months ago

See Issue #729

Example derived from @benshiffman's:

template <typename T>
class TemplatedClass {
public:
    template <typename U> TemplatedClass(U val);
    template <typename U> void funTemp(U val);
    template <typename U> bool operator<(TemplatedClass<U>& rhs);
};

WIth info:

infoMap
    .put(new Info("TemplatedClass<double>").pointerTypes("TemplatedClassDouble"))

    .put(new Info("TemplatedClass<double>::TemplatedClass<int>").javaNames("XXX"))

    .put(new Info("TemplatedClass<double>::funTemp<int>").javaNames("funTempInt"))

    .put(new Info("TemplatedClass<double>::operator <<double>").javaNames("lt"))
;

we now get:

    public TemplatedClassDouble(int val) { super((Pointer)null); allocate(val); }
    private native void allocate(int val);
    public native @Name("funTemp<int>") void funTempInt(int val);
    public native @Cast("bool") @Name("operator <<double>") boolean lt(@ByRef TemplatedClassDouble rhs);

Note that .javaNames("XXX") is necessary because in DeclarationList.add we require a Java name to be present in the info to trigger the template instantiation. This doesn't make sense for constructors. Maybe should we also accept info with define, and javaText. Or maybe remove any constraint on what is in the info (not tested).

If, because of polymorphism, we need to target functions with specific arguments, things get a bit more complicated:

infoMap
     .put(new Info("TemplatedClass<double>").pointerTypes("TemplatedClassDouble"))

     .put(new Info("TemplatedClass<double>::TemplatedClass<int>(U)").javaNames("XXX"))

     .put(new Info("TemplatedClass<double>::funTemp<int>(U)").javaNames("XXX"))
     .put(new Info("TemplatedClass<double>::funTemp<int>(int)").javaNames("funTempInt"))

     .put(new Info("TemplatedClass<double>::operator <<double>(TemplatedClass<U>&)").javaNames("XXX"))
     .put(new Info("TemplatedClass<double>::operator <<double>(TemplatedClass<double>&)").javaNames("lt"))
;

The use of placeholder U is a bit ugly, but I don't think we can do much about it. It's necessary to answer the infoMap queries in DeclarationList.add, like TemplatedClass<double>::funTemp<U>(U). TemplatedClass<double>::funTemp<int>(U) matches, but not TemplatedClass<double>::funTemp<int>(int).

Info("TemplatedClass<double>::funTemp<int>(int)").javaNames("funTempInt") is still necessary if we need to set the javaName because when we generate the function instance, the infoMap is queried with the function fullname, with template arguments replaced by their concrete values, and TemplatedClass<double>::funTemp<int>(U) won't match.

This PR should only introduce one breaking change: info keys using calling name of templated constructor won't match anymore and must be changed to use declaration name.

Regression tests:

benshiffman commented 9 months ago

Thanks for doing this! I'm doing my best to understand what's going on in the code and I think I get the gist of your changes. Will this allow templated operators with a different type than the parent class to map correctly with an info like so?

infoMap.put(new Info("TemplatedClass<double>::operator +<int>(TemplatedClass<U>&)").javaNames("XXX"));
infoMap.put(new Info("TemplatedClass<double>::operator +<int>(TemplatedClass<int>&)").javaNames("addInt"));
HGuillemet commented 9 months ago

Didn't you mean operator +<U> and operator +<int> ? Yes, that's one of the things fixed. And you won't need to specify the parameters. This should be enough:

infoMap.put(new Info("TemplatedClass<double>::operator +<int>").javaNames("addInt"));

Can you give this PR a try with your code ?

benshiffman commented 9 months ago

Didn't you mean operator +<U> and operator +<int> ?

It should have been operator +<int>(TemplatedClass<U>&) and operator +<int>(TemplatedClass<int>&) I think per your example above. I've updated my comment.

infoMap.put(new Info("TemplatedClass<double>::operator +<int>").javaNames("addInt"));

This mapping alone works will consequentially rename any nontemplated operator + to addInt. I had to use this mapping for operator =<U>(const TemplatedClass<U>&) to ensure proper naming since I have a nontemplated operator =:

infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >(const TemplatedClass<U>&)").javaNames("XXX"));
infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >(const TemplatedClass<std::complex<double> >&)").javaNames("putComplexDouble"));

infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =<double>(const TemplatedClass<U>&)").javaNames("XXX"));
infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =<double>(const TemplatedClass<double>&)").javaNames("putDouble"));

Now the templated one is named putDouble or putComplexDouble and the nontemplated one is just put.

Normal constructors and functions work as per your example.

HGuillemet commented 9 months ago

Didn't you mean operator +<U> and operator +<int> ?

It should have been operator +<int>(TemplatedClass<U>&) and operator +<int>(TemplatedClass<int>&) I think per your example above. I've updated my comment.

Right.

infoMap.put(new Info("TemplatedClass<double>::operator +<int>").javaNames("addInt"));

This mapping alone works will consequentially rename any nontemplated operator + to addInt. I had to use this mapping for operator =<U>(const TemplatedClass<U>&) to ensure proper naming since I have a nontemplated operator =:

infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >(const TemplatedClass<U>&)").javaNames("XXX"));
infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >(const TemplatedClass<std::complex<double> >&)").javaNames("putComplexDouble"));

infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =<double>(const TemplatedClass<U>&)").javaNames("XXX"));
infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =<double>(const TemplatedClass<double>&)").javaNames("putDouble"));

Ok. What about this:

infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >").javaNames("putComplexDouble"));
infoMap.put(new Info("TemplatedClass<double>::operator =<double>").javaNames("putDouble"));
infoMap.put(new Info("TemplatedClass<double>::operator =").javaNames("put"));
benshiffman commented 9 months ago

This ended up working. In the header file, my nontemplated assignment operator overload comes first.

infoMap.put(new Info("TemplatedClass<double>::operator =").javaNames("put")).
infoMap.put(new Info("TemplatedClass<double>::operator =<std::complex<double> >").javaNames("putComplexDouble"));
infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =").javaNames("put"));
infoMap.put(new Info("TemplatedClass<std::complex<double> >::operator =<double>").javaNames("putDouble"));

There is a catch though. If the templated operator comes first in the C/C++ header (or if there is no nontemplated operator), the above infoMapping will create identical functions to the nontemplated-first scenario but the Parser technically ignores the nontemplated version completely. Reversing the order of the infoMapping will cause the javaName for the nontemplated operator not to map, but both functions will still be inserted where the templated version should be.

So, I think best practice moving forward is nontemplated version first, in both header and mapping.

saudet commented 9 months ago
  • bullet: this PR generates one less constructor for btDefaultMotionState for a complex reason. This class has a virtualize info and a constructor with parameters having default values.

    • in master, because of the bug mentioned above and because the class lies in root namespace, the info map for the constructor is queried with the declaration name only and doesn't match the class info with virtualize.
    • with PR, constructor cppName is correct, and infomap is queries first with declaration name, then with calling (class) name. So virtualize is (wrongly) set to true for the constructor and since function overloads are not generated for virtualized functions, the no-arg constructor is not generated. virtualize info doesn't make sense for constructors. Does it ? If it does not, I suggest to ignore this info for constructors, which should avoid this problem related to a confusion between constructor info and class info.

Are you talking about this line?

if (context.virtualize) {
    // Prevent creation of overloads, that are not supported by the generated C++ proxy class.
    break;
}

Right, I guess it would make more sense to check for the @Virtual annotation like

if (context.virtualize && type.annotations.contains("@Virtual")) {
    // Prevent creation of overloads, that are not supported by the generated C++ proxy class.
    break;
}

Does this fix it?

HGuillemet commented 9 months ago

Are you talking about this line?

if (context.virtualize) {
    // Prevent creation of overloads, that are not supported by the generated C++ proxy class.
    break;
}

Yes. context.virtualize is set to true in the case of PR because info.virtualize is true. And info is set by a first query on btDefaultMotionState::btDefaultMotionState, then, since it returns nothing, on btDefaultMotionState which returns the class info, which has virtualize on. So to fix this we either can remove the ambiguous query on btDefaultMotionState (my preference, but could break too much. I didn't check on existing presets), or add a check in this if to exclude constructors.

Right, I guess it would make more sense to check for the @Virtual annotation like

if (context.virtualize && type.annotations.contains("@Virtual")) {
    // Prevent creation of overloads, that are not supported by the generated C++ proxy class.
    break;
}

Does this fix it?

I don't think so since, in case of virtualized non-constructor functions, @Virtual is added to modifiers variable and is not in type.annotations. I see that modifiers then doesn't apply to constructors, so we cannot have a virtualized constructor. What about simply testing :

if (context.virtualize && !type.constructor) {
    // Prevent creation of overloads, that are not supported by the generated C++ proxy class.
    break;
}

Or, probably cleaner, ignore info.virtualize for constructors:

        context.virtualize = (context.virtualize && type.virtual) || (info != null && info.virtualize && !type.constructor);
saudet commented 9 months ago

I don't think so since, in case of virtualized non-constructor functions, @Virtual is added to modifiers variable and is not in type.annotations. I see that modifiers then doesn't apply to constructors, so we cannot have a virtualized constructor.

Ok, so let's add a check for modifiers then. My understanding is that we can't have overloads only for the @Virtual methods, regardless of whether they are constructors or not. Are you saying you tried that, and it didn't work?

HGuillemet commented 9 months ago

Checking modifier should work, right.

But I'm realizing that @Virtual just doesn't apply to constructor:

@Target({ElementType.METHOD})
public @interface Virtual
Test.java:21: error: annotation type not applicable to this kind of declaration
    public @Virtual Test(int x) {}

So we might as well enforce this in Parser and ignore virtualize for constructors.

saudet commented 9 months ago

Right, I guess, but your context.virtualize = ... line above doesn't work when type.virtual is set, no?

BTW, shouldn't we have modifiers = "@Virtual + modifiers"; instead of overwriting everything with just modifiers = "@Virtual";?

HGuillemet commented 9 months ago

Right, I guess, but your context.virtualize = ... line above doesn't work when type.virtual is set, no?

type.virtual is true when we encounter the C++ keyword virtual. Which never happens for constructors. What do you mean ?

BTW, shouldn't we have modifiers = "@Virtual + modifiers"; instead of overwriting everything with just modifiers = "@Virtual";?

The public native default value for modifiers that we overwrite is replaced 6 lines below:

                modifiers += (context.inaccessible ? "protected native " : "public native ");

so it works in the general case. modifiers is set to another value above in the case of friend and of static/global functions, but in those case virtualize makes no sense. We could throw an exception if such case is detected ?

saudet commented 9 months ago

type.virtual is true when we encounter the C++ keyword virtual. Which never happens for constructors. What do you mean ?

Yes, you're right, so that works. Let's try that

so it works in the general case. modifiers is set to another value above in the case of friend and of static/global functions, but in those case virtualize makes no sense. We could throw an exception if such case is detected ?

I see, so that's OK too, no need to modify that for now.

HGuillemet commented 9 months ago

Could you comment about this too:

Note that .javaNames("XXX") is necessary because in DeclarationList.add we require a Java name to be present in the info to trigger the template instantiation. This doesn't make sense for constructors. Maybe should we also accept info with define, and javaText. Or maybe remove any constraint on what is in the info (not tested).

I think we should at least trigger the template instantiation if javaText is present, and not only javaName.

HGuillemet commented 9 months ago

type.virtual is true when we encounter the C++ keyword virtual. Which never happens for constructors. What do you mean ?

Yes, you're right, so that works. Let's try that

Pushed and tested: now bullet is unchanged and no new changes are introduced.

saudet commented 9 months ago

Note that .javaNames("XXX") is necessary because in DeclarationList.add we require a Java name to be present in the info to trigger the template instantiation. This doesn't make sense for constructors. Maybe should we also accept info with define, and javaText. Or maybe remove any constraint on what is in the info (not tested).

I think we should at least trigger the template instantiation if javaText is present, and not only javaName.

javaText replaces everything, I don't understand what effect that should have? In any case, we could add a check for define , sure, that should be fine. That's kind of what it's used for macros. I just didn't do that for templates because usually it's hard to come up with a nice name automatically.

Pushed and tested: now bullet is unchanged and no new changes are introduced.

:+1: Does this PR supersede pull #730?

saudet commented 9 months ago

Yes, this does appear to supersede pull #730.

@benshiffman Let me know if this is good to merge for you! Thanks

HGuillemet commented 9 months ago

Note that .javaNames("XXX") is necessary because in DeclarationList.add we require a Java name to be present in the info to trigger the template instantiation. This doesn't make sense for constructors. Maybe should we also accept info with define, and javaText. Or maybe remove any constraint on what is in the info (not tested).

I think we should at least trigger the template instantiation if javaText is present, and not only javaName.

javaText replaces everything, I don't understand what effect that should have?

We might want to instantiate a function template, and set the Java text of this instance explicitly, not only the Java name. That's what @benshiffman tried to do here.

In any case, we could add a check for define , sure, that should be fine.

Ok, I'll add this to the PR. The main question is whether we need to test anything about the info or just test for its presence. I guess I'll have to try out and see.

That's kind of what it's used for macros. I just didn't do that for templates because usually it's hard to come up with a nice name automatically.

Right, but there are 2 cases where java name is not meaningful : constructors and function templates with full info (see XXX in top post).

Does this PR supersede pull #730?

Yes, the line fixed in PR #730 doesn't exist any more, it has been replaced by the use of Templates.

saudet commented 9 months ago

We might want to instantiate a function template, and set the Java text of this instance explicitly, not only the Java name. That's what @benshiffman tried to do here.

You mean the issue https://github.com/bytedeco/javacpp/issues/729#issuecomment-1845926960 bit about ArrayInt? Exactly how is it supposed to know to name the instance "ArrayInt" without any javaNames?

Right, but there are 2 cases where java name is not meaningful : constructors and function templates with full info (see XXX in top post).

I get it for constructors, but not for the other use cases. Are you saying that TemplatedClass<double>::TemplatedClass<int> matches with TemplatedClass<int>? That doesn't sound right

benshiffman commented 9 months ago

We might want to instantiate a function template, and set the Java text of this instance explicitly, not only the Java name. That's what @benshiffman tried to do here.

In any case, we could add a check for define , sure, that should be fine.

Ok, I'll add this to the PR. The main question is whether we need to test anything about the info or just test for its presence. I guess I'll have to try out and see.

The need for this functionality might come up in the project I’m working on but I think the current interaction with templates in this PR is great as it is.

HGuillemet commented 9 months ago

We might want to instantiate a function template, and set the Java text of this instance explicitly, not only the Java name. That's what @benshiffman tried to do here.

You mean the issue #729 (comment) bit about ArrayInt? Exactly how is it supposed to know to name the instance "ArrayInt" without any javaNames?

javaName is not needed if we have a javaText to generate the template instance. But I'm realizing that it is needed if the type is referenced elsewhere, for example if another function returning a value which type is this template instance. So ok, we can require a javaName in this case.

Right, but there are 2 cases where java name is not meaningful : constructors and function templates with full info (see XXX in top post).

I get it for constructors, but not for the other use cases. Are you saying that TemplatedClass<double>::TemplatedClass<int> matches with TemplatedClass<int>? That doesn't sound right

No. Given this function template:

template <typename U> void f(U);
template <typename U> void f(U, int);

We want to instantiate the first overload, using the fullname with parameters. DeclarationList.add will look for instances with this info query: f<U>(U). To match this query, we need an info like f<int>(U). An info like f<int>(int) won't match. Then, when the parser generates the template instance in Parser.function, it queries: f<int>(int) to get the javaName and other informations. The existing info f<int>(U) won't match. That's why we need 2 info, and the javaName is meaningful for the second one only:

infoMap.put(new Info("f<int>(U)").javaNames("XXX"))
       .put(new Info("f<int>(int)").javaNames("fInt"));

If we want to avoid the double info, we need Infomap.normalize to "unparameter", like it "untemplates" and "unconst", so that f<int>(int) matches with f<U>(U). But I think it'll break too much and it's probably not worth for this corner case.

saudet commented 9 months ago

I see, but in that case infoMap.put(new Info("f<int>(U)", "f<int>(int)").javaNames("fInt")); should work so, that's alright, no?

HGuillemet commented 9 months ago

Indeed. So I suggest to just check for the presence of javaName or define. define could be used for constructors but also for methods where the default name is ok because the template arg is a function parameter and we can overload using the same java name (typically for operators).

saudet commented 9 months ago

Sounds good! 👍

benshiffman commented 9 months ago

Hi all, having great success with these changes despite one issue that I'm currently trying to isolate. Templated operator overloads with a std::vector<U> as the right-hand parameter are not mapping. This does not affect non-templated overloads. Will update with more info when I have it.

HGuillemet commented 9 months ago

Any news about the remaining issue ?

Could you try PR #733 with your code ? Some commits may impact you.

benshiffman commented 9 months ago

@HGuillemet As far as I can tell there are no changes in functionality with the new code. Using std::vector<U> is not super high on my priority list so it may be a few days before I can keep cracking at it.