Relaunch on error's behavior?

drogonframework / drogon

Drogon: A C++14/17/20 based HTTP web application framework running on Linux/macOS/Unix/Windows

MIT License

11.06k stars 1.06k forks source link

Relaunch on error's behavior? #1966

Open Mis1eader-dev opened 4 months ago

Mis1eader-dev commented 4 months ago

Continuing off from this trantor issue, it seems if drogon has the relaunchOnError option enabled, it interferes with other libraries and causes them to crash, and it completely disables trantor::ConcurrentTaskQueue::runTaskInQueue, which in turn stopped all database operations.

What is the behavior of that option? It interfered with nanomq's concurrency model too once it got turned on, and started causing crashes.

JaylinYu commented 3 months ago

Just being curious, how would drogon interfered with NanoMQ, our parallel computing framework is pure C implemented, not base on drogon. ^_^

nevermind, I guess what you mean is the archived nanomq project

Mis1eader-dev commented 3 months ago

@JaylinYu I have nanomq (my fork) statically linked within the same executable as drogon, and whenever I had nanomq perform a publish message, it crashed and stated something about non reentrant locks or mutexes.

After disabling Drogon's relaunchOnError, it went back to normal

hwc0919 commented 3 months ago

relaunchOnError starts a child process using fork. The parent process monitors the child process, and starts a new one when it dies. The child process runs the framework logics.

The fork syscall may mess up with opened files and sockets. That might be the problem.

Mis1eader-dev commented 3 months ago

Is it normal that it interfered with trantor::ConcurrentTaskQueue::runTaskInQueue?

hwc0919 commented 3 months ago

'relaunchOnError ' is not a method that can rescue your program from failing. It acts just like systemd, which will restart the program again when it dies.

Everything will become obsolete after relauching.

When this option is actually triggered, it means your program has fatal bugs in it, and you should find and fix it.

Mis1eader-dev commented 3 months ago

Yeah I'll have to perform some tests to find out if that runTaskInQueue triggers on an empty project. On the project which this issue occurred the runTaskInQueue function just didn't do anything, with no bugs present in the actual code within the lambdas being ran by runTaskInQueue or the code before and after calling runTaskInQueue. I'll do the test on an empty project and report back whether it gets called

hwc0919 commented 3 months ago

@Mis1eader-dev I misunderstood your situation. I thought you runTaskInQueue stop working after relauching, but actually it never works.

I never use the ConcurrentQueue. It uses a mutex and a conditional variable, and I guess fork may break it.

I sugguest you disable the relaucnOnError and use systemd instead. The side effects of this option is not fully tested against all components.

Mis1eader-dev commented 3 months ago

@hwc0919 do you recommend something different than trantor::ConcurrentTaskQueue for database operations?

hwc0919 commented 3 months ago

I recommand to use async or coroutine apis, so you don't need an task queue.

If you are using something that only has sync api, you might need a task queue.

The EventLoopThreadPool should be enough for most cases. It's disadvantages is that workloads of different threads may become uneven if you put lots of jobs with different load in it.

Mis1eader-dev commented 3 months ago

I do use a synchronous database API, namely rocksdb, so trantor::ConcurrentTaskQueue is the right tool for the job I presume. Will keep this issue open in case other people notice trantor::ConcurrentTaskQueue::runTaskInQueue not doing anything.