Closed MarianoFacundoArch closed 6 months ago
That's an issue I've never seen before but let's look into it. I'll give you some options on how to debug the module starting with the easiest fix.
Basically, I fucked up. Maybe.
First thing you can do is try version '2.5.1', since 2.5.2
changed how objects are being deleted (they are actually now being deleted, everything before that was just a huge memory leak).
If that doesn't help or you want further insights (which would be great for me to actually solve this issue), you can enable logging, which requires you to build the module yourself. You might want to check the logging test for further information on how to use logging.
Now this is the good stuff. But also the most complicated option.
I am already using address sanitizers (I'll call these asan from now on) for this module for quite some time, so I've made this pretty straight-forward. Basically, all you need to do is build a docker image starting from ghcr.io/markusjx/node-java-bridge/java-bridge-asan-testcontainer
and run your application inside that docker image. The base image already contains an asan build of node 20 and openjdk 17 plus all required dependencies for this module.
Your dockerfile might look like this:
FROM ghcr.io/markusjx/node-java-bridge/java-bridge-asan-testcontainer:latest
WORKDIR /app
# Not sure if git is installed
RUN git clone https://github.com/MarkusJx/node-java-bridge
WORKDIR /app/node-java-bridge
RUN npm ci &&\
npm run build:napi:debug -- --target=x86_64-unknown-linux-gnu --cargo-flags="-Zbuild-std" &&\
npm run build:ts &&\
npm link
ADD . /app/your-app
WORKDIR /app/your-app
# Link the local version of this module
RUN npm link java-bridge
# Install and build steps for your app...
Didn't test this setup, simply adapted it from the asan workflow, but I'm confident you'll work out any issues. But if you need further instructions, feel free to contact me.
I should probably compile this into a debugging guide at some point.
Hi Markus,
The error is quite interesting, as it happens with only one function, that what does internally in the java code is calling a native library with JNI, and then it returns some byte array.
I tried with version2.5.1 as instructed, but the problem is the same one:
free(): invalid pointer
I cannot really use the docker image you provided, because my java code relies on some .SO files that are compiled for ubuntu (in this case I am using ARM). So that's a test I could not do. (Also I could not compile as I am under ARM architecture and I am willing to keep software running in ARM)
If you have any way of more direct contact it'd be great as I think I might be able to further assist you with debugging.
The call to java that is giving the problem is this one:
public boolean init(long ptr) {
this.part = ptr;
this.initializePart();
return this.getPart() != 0L;
}
void initializePart() {
this.part = this.nativeValidate(this.part);
}
public byte[] someFunction() throws someException {
return this.nativeSomeFunction(this.getPart());
}
private native byte[] nativeSomeFunction(long ptr);
Whenever I call someFunction(), it just kills nodeJs. It doesn't happen with other functions, but just with that one. Which is part of a given class. It's curious that it just kills the nodejs entirely with that error.
Is there anyway I can further assist to address this?
Did you try running your native method in pure java (without this module)? Just to make sure the invalid free doesn't originate from your native module, as invalid frees are not that easy to create in rust (that's why I chose rust in the first place).
I could try building the asan containers for arm but this would take a long time, since I'd have to do this through emulation, which will be slow.
If you want a more direct contact method, try contacting me through this email address or I guess linkedin would be an option.
Hi Markus, Yes it works perfectly in pure java. I am trying to build the ASAN myself, but for some reason, fails.
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y git-core curl build-essential openssl libssl-dev python3
RUN apt-get install -y clang
RUN git clone https://github.com/nodejs/node.git
WORKDIR /node
RUN git checkout v20.x
RUN ./configure --debug --enable-asan
RUN make -j4
RUN make install
1825.0 rm 889aa6e08bf291915b2edfb5755eacba271e7d14.intermediate 5e7e6dacce553103c642464481a37d2a5cf36482.intermediate f3f5b560e1d922d1c82a58e0b85ee72dc68149b7.intermediate
------
failed to solve: process "/bin/sh -c make -j4" did not complete successfully: exit code
I think you need to build node.js using clang, I've started a workflow run in my debug-node repo, let's see how that goes.
Update: Didn't go too well. But this dockerfile should work:
FROM debian:bullseye as build
RUN export DEBIAN_FRONTEND=noninteractive; \
export DEBCONF_NONINTERACTIVE_SEEN=true; \
apt-get update && apt-get -y upgrade && \
apt-get install -y clang git build-essential curl
WORKDIR /app
RUN git clone https://github.com/nodejs/node -b v20.x
WORKDIR /app/node
ENV ASAN_OPTIONS=detect_leaks=0
RUN CC=$(which clang) CXX=$(which clang++) ./configure --debug --enable-asan
RUN CC=$(which clang) CXX=$(which clang++) make -j3
RUN mkdir -p /nodejs/node
RUN mv /app/node/out/Debug/node /nodejs/node/node
RUN curl -qL https://www.npmjs.com/install.sh | \
PATH="$PATH:/nodejs/node" sh && \
mv /nodejs/bin /nodejs/npm
ENV PATH="$PATH:/nodejs/node:/nodejs/npm"
RUN node --version
RUN npm --version
FROM debian:bullseye-slim
RUN apt-get update && apt-get install -y libatomic1
COPY --from=build /nodejs /nodejs
ENV PATH="$PATH:/nodejs/node:/nodejs/npm"
CMD [ "/bin/bash" ]
Did you try building the container on your machine?
If not, is it possible for you to provide a minimal reproducible example of this issue (including a full java class, the C and JS code)? I've not been able to reproduce the error with the minimal information you provided. Is it possible that the issue is caused by your native dependency, since it occurs when your native code is called?
Hi Markus, I ended up doing everything in Java for that part. I don't think there was a solution, as I tried using another Java Bridge, and the error was the same. I guess the native implementation in Java was coming to a weird spot, so I deleted that project completely and did it with Java entirely. Thank you very much for your assistance, though, and keep up the amazing job! I can't wait to see the exception objects access!!
Sounds great! Already thought it might be something like that.
I'll close this issue for now, feel free to re-open or open another issue if you need further assistance.
There's a specific method that I call from nodejs in a jar that I have, that kills the complete node process with any of those two errors. Any clue on where to start? how to fix?