bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.43k stars 576 forks source link

Crash when passing a unique_ptr by value #736

Closed HGuillemet closed 6 months ago

HGuillemet commented 6 months ago
#include <memory>

struct G {
  int i = 18;
};

class C {
  std::unique_ptr<G> my_g;
  public:
    C(std::unique_ptr<G> g): my_g(std::move(g)) {}
    int get() { return my_g->i; }
};
import j.*;
import java.io.IOException;

class T {
  public static void main(String[] a) {
    G g = new G();
    System.out.println(g.i());
    C c = new C(g);
    System.out.println("g address="+g.address());
    System.out.println(c.get());
  }
}

The output is:

18
g address=0
50412920

Here is the JNI code of C.allocate:

JNIEXPORT void JNICALL Java_j_C_allocate(JNIEnv* env, jobject obj, jobject arg0) {
    G* ptr0 = arg0 == NULL ? NULL : (G*)jlong_to_ptr(env->GetLongField(arg0, JavaCPP_addressFID));
    jlong size0 = arg0 == NULL ? 0 : env->GetLongField(arg0, JavaCPP_limitFID);
    void* owner0 = JavaCPP_getPointerOwner(env, arg0);
    jlong position0 = arg0 == NULL ? 0 : env->GetLongField(arg0, JavaCPP_positionFID);
    ptr0 += position0;
    size0 -= position0;
    UniquePtrAdapter< G > adapter0(ptr0, size0, owner0);
    jthrowable exc = NULL;
    try {
        C* rptr = new C(adapter0);
        jlong rcapacity = 1;
        JavaCPP_initPointer(env, obj, rptr, rcapacity, rptr, &JavaCPP_j_C_deallocate);
    } catch (...) {
        exc = JavaCPP_handleException(env, 8);
    }

    G* rptr0 = adapter0;
    jlong rsize0 = (jlong)adapter0.size;
    void* rowner0 = adapter0.owner;
    if (rptr0 != ptr0) {
        JavaCPP_initPointer(env, arg0, rptr0, rsize0, rowner0, &UniquePtrAdapter< G >::deallocate); /* XXX */
    } else {
        env->SetLongField(arg0, JavaCPP_limitFID, rsize0 + position0);
    }
    if (exc != NULL) {
        env->Throw(exc);
    }
}

Object c is freed at line marked with /* XXX */. If I understand well this line resets the address of the object since its ownership passed away, like the program output shows. I'm not sure it's a good thing but the fact that it also registers a deallocator seems not.

saudet commented 6 months ago

You're going to have the same problem in C++ if you try to create a std::unique_ptr from an object allocated on the stack. That's how you should think about this. Although we can't actually allocate objects on the stack, that's how it behaves, to make the Java syntax as close as possible to C++.

HGuillemet commented 6 months ago

A C++ equivalent is:

#include "inc.h"
#include <memory>
#include <stdio.h>

int main(int ac, char **av) {
    std::unique_ptr<G> gp = std::make_unique<G>();
    printf("%d\n", gp->i);
    C c(std::move(gp));
    printf("%p\n", gp.get());
    printf("%d\n", c.get());
}

I don't think there is something wrong. Is it ? It prints the correct:

18
(nil)
18

Management of object G is passed to object C and it shouldn't be destroyed as long as object C lives.

For a real world example, see in PyTorch OptimizerParamGroup and the options argument of its constructor. Currently my attempts to use it from the presets trigger crashes.

saudet commented 6 months ago

That's not equivalent. If you call std::make_unique with JavaCPP it's going to do the same thing and it will work.

HGuillemet commented 6 months ago

Can I ? Using similar trick that we did with @Shared annotation on constructors ?

I'm wondering if, in the UniquePtrAdapter, we shouldn't empty the owner field when we perform the move in U&&() casting operator.

saudet commented 6 months ago

We could make it so that std::make_unique gets called as part of the constructor of the Java object, sure...

HGuillemet commented 6 months ago

Ok, Adding @UniquePtr @Name("std::make_unique<G>") to constructors of classes that can be instantiated in Java and passed to native function as unique pointer seems mandatory indeed, like for shared pointers.

However, I still think there is a problem of owner field not cleared after a move:

std::unique_ptr<G> x() { return std::make_unique<G>(); }
void y(std::unique_ptr<G> g) {  }
G g = x();
y(g);
g.deallocate();

Triggers a segmentation fault (double free). I don't think it's normal. The owner field of g should have been cleared at the same time of its address. Don't you think so ?

saudet commented 6 months ago

The owner field is fine, that's the std::unique_ptr itself. If it still crashes it's something else

HGuillemet commented 6 months ago

The owner field of UniquePtrAdapter is the managed pointer owner if I understand well. As shown in the /* XXX */ line above of the JNI code, this field is used to reinitialize the owner field of the Java object after the method call. So after a move we end up with a Java object with a null address but a not-null owner, which is deallocated when the Java object is GCed. Replacing in the adapter:

operator U&&() { return UNIQUE_PTR_NAMESPACE::move(uniquePtr); }

by

operator U&&() { owner = NULL; return UNIQUE_PTR_NAMESPACE::move(uniquePtr); }

seems to fix the problem, but I don't really know if that makes sense.

saudet commented 6 months ago

Hum, I think that's going to create a memory leak somewhere, but the existing unit tests for UniquePtrAdapter pass, so... Open a pull request if you're sure you want this, it's fine by me

saudet commented 6 months ago

Oh, no, actually this can't create a memory leak because we can't modify the deallocator anyway, so that's going to get deallocated fine.

HGuillemet commented 6 months ago

if you're sure you want this, it's fine by me

I'm sure there is a double free problem somewhere, but not sure at all it's the right fix. Or whether a similar fix would be needed somewhere: probably in the same cast operator overload in MoveAdapter ?

I'll also push a PR for Pytorch to add the proper annotations on constructors of OptimizerOptions and its subclasses, and probably others. Several other presets are potentially affected: arrow, onnxruntime,onnx, tensorflow-lite, tritonserver.

saudet commented 6 months ago

I'm sure there is a double free problem somewhere, but not sure at all it's the right fix. Or whether a similar fix would be needed somewhere: probably in the same cast operator overload in MoveAdapter ?

Well it looks fine to me. The pointer is dead anyway after that function call so it can't hurt. Same for MoveAdapter, sure, thanks

I'll also push a PR for Pytorch to add the proper annotations on constructors of OptimizerOptions and its subclasses, and probably others. Several other presets are potentially affected: arrow, onnxruntime,onnx, tensorflow-lite, tritonserver.

:+1: Don't bother with Arrow though, I'm not maintaining that one