bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.47k stars 582 forks source link

Virtualizing a class causes a method returning a String return a dangling pointer #656

Open HGuillemet opened 1 year ago

HGuillemet commented 1 year ago

The JNI code for method name of Module in Pytorch contains:

signed char* rptr;
StringAdapter< char > radapter(ptr->name());
rptr = radapter;

If we add an Info virtualize() on Module, the code becomes:

const signed char* rptr;
StringAdapter< char > radapter((dynamic_cast<JavaCPP_torch_0003a_0003ann_0003a_0003aModule*>(ptr) != NULL ? ((JavaCPP_torch_0003a_0003ann_0003a_0003aModule*)ptr)->name() : ptr->name()));
rptr = radapter;

The const modifier of rptr, when present or not, changes which overload of the implicit cast on the last line is applied. When absent, a copy of the string is performed. When present, no copy is performed and the returned pointer is dangling because the string it points to is freed at the end of the block.

Probably related to #374

saudet commented 1 year ago

Does this happen only with PyTorch and only with Module? It's not possible to reproduce this with a smaller example?

HGuillemet commented 1 year ago

Here is one:

#include <string>

class M {
  std::string name = "Bob";
  public:
    const std::string& n() const { return name; }
};

With virtualize(), parsed as:

    public native @Const @StdString @ByRef BytePointer n();

Generated JNI:

JNIEXPORT jobject JNICALL Java_virtopt_M_n(JNIEnv* env, jobject obj) {
    M* ptr = (M*)jlong_to_ptr(env->GetLongField(obj, JavaCPP_addressFID));
    if (ptr == NULL) {
        env->ThrowNew(JavaCPP_getClass(env, 9), "This pointer address is NULL.");
        return 0;
    }
    jlong position = env->GetLongField(obj, JavaCPP_positionFID);
    ptr += position;
    jobject rarg = NULL;
    const signed char* rptr;
    jthrowable exc = NULL;
    try {
        StringAdapter< char > radapter(ptr->n());
        rptr = radapter;
        jlong rcapacity = (jlong)radapter.size;
        void* rowner = radapter.owner;
        void (*deallocator)(void*) = rowner != NULL ? &StringAdapter< char >::deallocate : 0;
        if (rptr != NULL) {
            rarg = JavaCPP_createPointer(env, 10);
            if (rarg != NULL) {
                JavaCPP_initPointer(env, rarg, rptr, rcapacity, rowner, deallocator);
            }
        }
    } catch (...) {
        exc = JavaCPP_handleException(env, 8);
    }

    if (exc != NULL) {
        env->Throw(exc);
    }
    return rarg;
}

Running a test with System.err.println(new M().n().getString()); yields ��B.

Without virtualize(), parsed as:

    public native @StdString BytePointer n();

Generated JNI:

JNIEXPORT jobject JNICALL Java_virtopt_M_n(JNIEnv* env, jobject obj) {
    M* ptr = (M*)jlong_to_ptr(env->GetLongField(obj, JavaCPP_addressFID));
    if (ptr == NULL) {
        env->ThrowNew(JavaCPP_getClass(env, 9), "This pointer address is NULL.");
        return 0;
    }
    jlong position = env->GetLongField(obj, JavaCPP_positionFID);
    ptr += position;
    jobject rarg = NULL;
    signed char* rptr;
    jthrowable exc = NULL;
    try {
        StringAdapter< char > radapter(ptr->n());
        rptr = radapter;
        jlong rcapacity = (jlong)radapter.size;
        void* rowner = radapter.owner;
        void (*deallocator)(void*) = rowner != NULL ? &StringAdapter< char >::deallocate : 0;
        if (rptr != NULL) {
            rarg = JavaCPP_createPointer(env, 10);
            if (rarg != NULL) {
                JavaCPP_initPointer(env, rarg, rptr, rcapacity, rowner, deallocator);
            }
        }
    } catch (...) {
        exc = JavaCPP_handleException(env, 8);
    }

    if (exc != NULL) {
        env->Throw(exc);
    }
    return rarg;
}

Test yields Bob.

HGuillemet commented 1 year ago

This is solved by removing the test of context.virtualize here, but I have no idea of the implications and why the test is here.

saudet commented 1 year ago

I see, what it should map to is actually something like this:

   public native @Cast({"", "const std::string&"}) @StdString @ByRef BytePointer n();

The logic for that is currently missing from the Parser, but we can easily work around that with Info.javaText(), unless of course there's too many of those functions, in which case it would make sense to spend time on this.

When the function doesn't get virtualized though, such as in this case, we don't need to worry about that logic anyway. We can probably just change the context.virtualize check to context.virtualize && type.virtual, but that "type" needs to be the one of the function, not the parameters, so we'd probably need to update the context for that. Actually, I think we could just narrow down setting context.virtualize at the function level instead of at the group level...

HGuillemet commented 1 year ago

What exactly is the missing logic ? What are the functions concerned ? Why the annotations are different when the method is virtualized ?

Following your suggestion I have turned off context.virtualize in function, in a new context, when the function is not virtual and it seems enough indeed for virtualizing Module in Pytorch for now.

saudet commented 1 year ago

What exactly is the missing logic ? What are the functions concerned ?

Anything required to output that. I haven't looked into it, but probably declarator().

Why the annotations are different when the method is virtualized ?

Because we need to override the function in C++ and that requires the exact same declaration to work, down to every single const.