Open apeloquin-agilysys opened 1 month ago
When I do a diff of the node_modules/@confluentinc/kafka-javascript/build/Release/
directories before/after running the node-pre-gyp
on the started container, the noticeable difference is many instances of:
/home/semaphore/.cache
replaced with /root/.cache
/home/semaphore/confluent-kafka-javascript
replaced with /v
If I download directly from confluent-kafka-javascript-v0.1.15-devel-node-v115-linux-musl-x64.tar.gz I see the references are all /root/.cache
and /v
; so it's unclear to me how the docker image is ending up with an apparently different version despite resolving to the same download URL.
Hey - I repro'd this issue, but I'm not sure of the cause yet. The confluent-kafka-javascript.node is different at the start and at the end after running npx node-pre-gyp install --update-binary
(I checked with the md5sum).
Here's my process:
ldd
).Suggested workaround for now:
COPY ./dist /app/
WORKDIR /app
+ RUN rm -rf node_modules
(you can also just delete node_modules/@confluentinc if you want to be more specific).
As far as I can understand, the npm install
within the Dockerfile isn't re-pulling the right platform/libc combo of confluent-kafka-javascript.node, and just goes on with whatever is there within the node_modules unless it's empty.
Also, since there is the pre-compiled binary now, the dockerfile can be trimmed to a great extent, something like:
FROM node:20-alpine
COPY ./dist /app/
WORKDIR /app
RUN rm -rf node_modules/\@confluentinc
RUN npm install --omit=dev
EXPOSE 4000
CMD [ "node", "app.js" ]
I will update the example.
I have a fix in mind, changing the npm install script to node-pre-gyp install --fallback-to-build --update-binary
rather than node-pre-gyp install --fallback-to-build
, however, that will incur the download of a remote binary more than required, so I'm not making that change immediately.
I'll discuss that, and other possible solutions with my team, and provide a fix.
Our build uses an![image](https://github.com/confluentinc/confluent-kafka-javascript/assets/23127550/1e394fdb-dae4-4bcf-95b8-a00b3afd731c)
ubuntu-latest
Github runner to build a Docker image.Our Dockerfile follows the example provided in this repo.
The deployed pod is hosted in AKS, and both the runners and host nodes are
amd64
arch.Without the
@confluentinc/kafka-javascript
dependency in the package.json, the application will start without issue on the container.With the
@confluentinc/kafka-javascript
dependency in the package.json (and no reference from the application), the application will immediately fail with:While troubleshooting, we discovered that if we reinstalled the package on the running container, the application would then startup normally.
Initial thought was that the wrong flavor of librdkafka was being download.
By adding the following to the Dockerfile, I was able to capture the node-pre-gyp output:
Again, launching this container results in the segmentation fault on startup.
Starting the container, and running the following:
...seemingly performs the same operation we saw during the Docker image construction:
...yet after this operation is performed, the application starts without issue.
Please help us to understand what is going on here, and how we can solve this problem.