Closed lambrospetrou closed 1 month ago
To add more context, the issue is not related to --repeat
. In Skybear.NET I run scripts without that option, hence only once, and every few days I get the above memory corruption errors. Today for example, it happened 3 times.
Hi @lambrospetrou, thanks for reporting the issue. I don't reproduce it yet on my local machine with your steps. We are going to try to reproduce it in our pipeline, and we'll come back to you.
Hi @lambrospetrou, thanks for reporting the issue. I don't reproduce it yet on my local machine with your steps. We are going to try to reproduce it in our pipeline, and we'll come back to you.
Yeah, sometimes it needs a few attempts to get it crashing. I always get it happening if I run the hurl ...
command in a bash loop or run it fast enough a few times.
Hi all,
I tried to reproduce the bug on my machine (MBP M1), I didn't reproduce it either in local (with brew's Hurl), or in a Docker container (with instructions provided by @lambrospetrou ). I've tried hard 😅! In the case of Docker, I've not used the .deb file (its only for x86 arch), but I've installed the arm64 precompiled binary.
That's said, I've run the sample through valgrind and got this result:
$ cargo valgrind run -p hurl -- --test /tmp/vuejs.hurl
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.04s
Running `/root/.cargo/bin/cargo-valgrind target/debug/hurl --test --jobs 1 /tmp/vuejs.hurl`
Error leaked 384 B in 1 block
Info stack trace (user code at the bottom)
at calloc
at calloc (rtld-malloc.h:44)
at allocate_dtv (dl-tls.c:370)
at _dl_allocate_tls (dl-tls.c:629)
at allocate_stack (allocatestack.c:429)
at pthread_create@@GLIBC_2.34 (pthread_create.c:655)
at std::sys::pal::unix::thread::Thread::new (thread.rs:87)
at std::thread::Builder::spawn_unchecked_ (mod.rs:577)
at std::thread::Builder::spawn_unchecked (mod.rs:456)
at spawn<hurl::parallel::worker::{impl#3}::new::{closure_env#0}, core::result::Result<(), std::sync::mpsc::SendError<hurl::parallel::message::WorkerMessage>>> (mod.rs:388)
at std::thread::spawn (mod.rs:694)
at hurl::parallel::worker::Worker::new (worker.rs:75)
at hurl::parallel::runner::ParallelRunner::new::{{closure}} (runner.rs:127)
at core::iter::adapters::map::map_fold::{{closure}} (map.rs:89)
at core::iter::traits::iterator::Iterator::fold (iterator.rs:2588)
Summary Leaked 384 B total (0 other errors)
So there is one thread leaked detected by valgrind. Not sure if this is related but we should fix this.
@lambrospetrou can you try to reproduce the issue by not using --test
? This way, the code will use the sequential run path and we will see if the issue is with the parallel implementation or not. If these errors start happening with 5.0.1, this could explain why...
I cannot reproduce without the --test
option. I tried it back when I opened the issue, and it only happens when using the --test
option, also confirmed it again now.
I am attaching a video showcasing that it happens every few times on my side with the --test
option 😄 Check the 6th run, 11th run, and the last two.
https://github.com/user-attachments/assets/f6c52a9b-1bdb-4604-ad0f-1685388f66ca
Also, when using only 1 thread with hurl --test --repeat 100 --jobs 1 vuejs.hurl
it also works fine, I cannot make it fail. So, indeed it seems to be something with the parallel mode.
Your Docker host is a Windows machine I'm correct? I will try to reproduce the bug from a Windows machine (with Docker etc). You're reproducing it quite easily while I can't make it, it maybe make a difference? (It doesn't explain why it happens on skybear.net though)
Your Docker host is a Windows machine I'm correct? I will try to reproduce the bug from a Windows machine (with Docker etc). You're reproducing it quite easily while I can't make it, it maybe make a difference? (It doesn't explain why it happens on skybear.net though)
Locally, I am using WSL2 (Ubuntu 24.04.1 LTS
) in Windows 11, and using Docker Desktop on the Windows side.
For Skybear.NET, I use Fly.io and my container is based off ubuntu:noble
.
I have added an integ test that combines --test
and --repeat
https://github.com/Orange-OpenSource/hurl/pull/3305
The bug is still not reproduced :-(
I haven't managed to reproduce it, but I think there's something I would make a little cleaner in the graceful shutdown of threads workers. I don't know if it will fix this issue or not but it will be a cleaner and it's worth trying (plus it can fix the valgring issue that tickles me). As soon as I've time to do it !
For the record, I have retries on Skybear.NET now, so that when it happens again it just retries the script execution, so it's not urgent at the moment. It would be nice though to see what in the parallelism could cause race conditions on memory. I might take a stub in reading some Rust later in the week 😏
Hi @lambrospetrou I've made a branch with a better code for shutting down threads (following the Rust book sample until the end 😇 ). At least, valgrind doesn't raise error any more now. ~~Would you be able to pull a branch and build Hurl in order to reproduce (or hopefully not reproduce) the error ? (the branch is https://github.com/Orange-OpenSource/hurl/tree/3297-memory-corruption-when-running-the-hurl-cli-with---test-mode) If it's too complicated for you, I'll see with @lepapareil how we can build intermediate binaries artefacts to be tested without building on your own~~
Update: I will add here the artefacts for linux so you don't need to rebuild Hurl.
@jcamiel Good news so far. I tested it now with the release-deb-x64-artifacts.zip
from above, and could not reproduce a crash. I ran the following a few times and no errors.
for i in {1..100}; do hurl --test --repeat 100 vuejs.hurl || { echo "Error at iteration $i" >&2; break; }; done
Dockerfile used:
# Tested both.
# FROM noble
FROM debian:trixie-slim
RUN apt-get update && apt-get install --no-install-recommends -y ca-certificates curl libcurl4 libxml2
# Install Hurl.
COPY ./hurl_6.0.0-SNAPSHOT_amd64.deb /app/hurl_6.0.0-SNAPSHOT_amd64.deb
RUN apt update && apt install /app/hurl_6.0.0-SNAPSHOT_amd64.deb
So, I think we can say the fix in https://github.com/Orange-OpenSource/hurl/pull/3321/files works (although I haven't gone deep to understand the root cause yet of that fix :) ).
Yes, we were quitting the program as soon as all files have been executed, without insuring that every thread was properly finished. I thought it was OK, since the program is quitting anyway but, apparently, it's recommended to insure threads are well stopped before existing program (in C/C++ or Rust at least)
(The sample program, from the official Rust book, on multithreading is clear on it also). We'll add also a Valgrind check that could have detected it. I'm rather confident that the bug is fixed now!
Thanks for detecting it, it was a nasty bug!
What is the current bug behavior?
There is a corrupt memory heap operation somewhere when finalizing the Hurl run. I am getting the following errors every few hundreds of runs of a simple Hurl script:
Errors occurring:
One time I got this error instead as well
malloc_consolidate(): unaligned fastbin chunk detected
, which is memory related too.Steps to reproduce
Create the file
Dockerfile-hurl-memory-issue
with contents:Build the image and get inside the container:
Create the Hurl file:
Run the Hurl file (you might need to repeat the following a few times till you cause the memory error):
What is the expected correct behavior?
It should complete successfully without any errors.
Execution context
hurl --version
):The issue occurs on
ubuntu:noble
latest tag as of today: https://hub.docker.com/layers/library/ubuntu/noble-20240904.1/images/sha256-74f92a6b3589aa5cac6028719aaac83de4037bad4371ae79ba362834389035aa?context=exploreAnd also happens with
debian:trixie-slim
: https://hub.docker.com/layers/library/debian/trixie-slim/images/sha256-4cc21783b221c03e841bc1274e5babef8c52ccabfc20abadc37e83e37bed6990?context=explorePossible fixes
Not sure...