akash-akya / vix

Elixir extension for libvips
MIT License
166 stars 20 forks source link

Investigate crash on high load #111

Closed akash-akya closed 6 months ago

akash-akya commented 1 year ago

Ill dig up a coredump when the next time it crashes. My top suspect is that there might be unsafe memory usage by vips. I am doing parallel processing of image upload with vips. Its a concurrent worker load of 30 workers. If I go more, I see more crashes, it seems to be relatively stable under 30 works concurrently.

is the prebuild binary have webp support? as well as pango2? I have some watermarking and overlaying work I do

Originally posted by @ringofhealth in https://github.com/akash-akya/vix/issues/109#issuecomment-1521046432

akash-akya commented 8 months ago

Hey @ringofhealth, just want to check if you are still seeing segfaults under high-load with latest versions? I have fixed several issue recently and libvips also had several fixes.

ringofhealth commented 6 months ago

havent seen crashes as of yet on the laster branches. Will let you know if anything new comes up! Thanks again for all the amazing support and work on this!! ❤️

akash-akya commented 6 months ago

Thanks for the update! I am closing the issue in that case. Feel free to reopen if you see the issue again.

ringofhealth commented 6 months ago

just want to comment quickly, we are running concurrent upload again for the last few weeks, and the segfault started to happen again. this is on

when it segfaults, there is not much error or warning or stacktrace. all I get is

along something of segfault, core ejected

i have been tinkering about the settings abit. I tracked on

-VIPS_CONCURRENCY=1 seems to help the situation

its relatively sporatic, sometimes its fine for 8-9 hrs. other times it crashes within 1-2

let me know if there is anything I can help to diagnost this issue futher

akash-akya commented 5 months ago

@ringofhealth thanks for reporting. So when this happens, the vm crashes and restarts, correct? Is there any pattern? Such as high memory usage, high open file descriptor limit, high CPU etc.

It would be great if you can share the core dump file. It would help us to understand exact operation and state at the time of crash. Its location changes depending on the OS, you can check where it is located for your system.