FIWARE / context.Orion-LD

Context Broker and CEF building block for context data management which supports both the NGSI-LD and the NGSI-v2 APIs
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.06.01_60/gs_CIM009v010601p.pdf
GNU Affero General Public License v3.0
50 stars 41 forks source link

-reqPoolSize option makes orion ld crash #1367

Open efntallaris opened 1 year ago

efntallaris commented 1 year ago

Hello,

I have been experimenting with orion ld context broker recently and try to tune the performance (https://fiware-orion.readthedocs.io/en/1.7.0/admin/perf_tuning/index.html) based on my use case.

It seems that whenever I use the -reqPoolSize option the process of orion-ld crashes.

This is my docker-compose.yml: image

My orion-ld version: { "orionld version": "post-v1.2.0", "orion version": "1.15.0-next", "uptime": "0 d, 0 h, 0 m, 2 s", "git_hash": "nogitversion", "compile_time": "Wed May 3 17:02:36 UTC 2023", "compiled_by": "root", "compiled_in": "", "release_date": "Wed May 3 17:02:36 UTC 2023", "doc": "https://fiware-orion.readthedocs.org/en/master/" }

I use 2 clients that I have created in python, each client is doing multiple patch requests and updating values on specific id's.

For each of the updated values orion-ld also sends a notification on another process I have made.

running the use case without the -reqPoolSize option code everything runs smoothly.

I get this log file after the crash: 8af896557756967e773c813767d8ebe9

kzangeli commented 1 year ago

ok, noted. I will have a look asap. Thank you for reporting!!!

kzangeli commented 1 year ago

I think I know why you're having problems ... This -reqPoolSize is an old CLI parameter from Orion, long before I started with Orion-LD. For Orion-LD I replaced a whole lot of C++ code with Good old C code and one of many advantages of that was that I could use thread variables and avoid hundreds of calls to malloc/new. For that to work, I need every incoming request to run in its own thread, which means I need the "MHD_USE_THREAD_PER_CONNECTION" option for libmicrohttpd.

When you set -reqPoolSize, that options isn't used, but MHD_USE_SELECT_INTERNALLY | MHD_USE_EPOLL instead.

I had completely forgotten about that, should have removed that part.

Now, if you're interested in performance, I'd suggest you try the option "-experimental". It's a bad name, it's not experimental anymore (it was about 2 years ago). The option should be renamed to "-mongoc". What it does is that it tells the broker to use the new implemenation instead of the old one.

The old implementation (which is still default) uses the old deprecated C++ driver for mongo, and the old service routines, also in C++.

I reimplemented all of that in C and with the new C driver for mongo (mongoc). I expect the performance to gain quite a lot with the change.

So, I invite you to try that out (the -experimental option).

I'll have a look at that old stuff (about -reqPoolSize cancelling out MHD_USE_THREAD_PER_CONNECTION), as it looks incorrect to me. Just need some time to test it out.

efntallaris commented 1 year ago

@kzangeli Thanks for mentioning that! But still having the same issue, this time I do not get any output on the log file (for the reason crashed),

Im running 2 different processes at the same time, each process is doing patches. If I run orion-ld without the -experimental parameter everything works fine. I even run it multiple times to ensure.

kzangeli commented 1 year ago

Yeah, I need to remove that option. Can't be used with Orion-LD, as it makes heavy use of thread variables. Actually, not remove it, but make it not cancel out "thread per connection", that's whatr I need to do.

I will try to fix this. It's a bit of a mess right now to tell you the truth !

For now, don't use it, and do try out -experimental for performance. That is actually not very tested yet, a heavy concurrency test for -experimental (and that's is the exact reason why it's still called experimental and not the default)

efntallaris commented 1 year ago

@kzangeli I have tried the -experimental option and I still have the same issue where the process crashes. I did not mention that on my previous post, I apologize.

kzangeli commented 1 year ago

Yeah, I know, it's all because of the -reqPoolSize. Can't use it the way it is right now, as it cancels out the "thread per connection" option I'll fix it and let you know.

efntallaris commented 1 year ago

Hello @kzangeli, after running with only -experimental (without -reqPoolSize) I still have the same issue. I will attach a debugger and try to see if there is anything over there that can help you. If you want me to send you a sample of code that I am using and reproduce the bug let me know.

kzangeli commented 1 year ago

Sorry, busy days ... Yes, if you have found a crash, I'm very much interested! And the way for me to fix it, the first step is always to be able to reproduce it.