Open kiranpradeep opened 7 years ago
Are you able to kill without -9
?
@zafields Yes. I am able to kill with out -9 and same crash repeats. Copying terminal below
username@machine_name:~/kiran/azure-iot-gateway-sdk/build$ ./samples/proxy_sample/proxy_sample ./launch_sample_lin.json
gateway successfully created from JSON
gateway shall run until ENTER is pressed
Error: Time:Fri Jun 30 10:35:44 2017 File:/home/862537/kiran/azure-iot-gateway-sdk/proxy/outprocess/src/module_loaders/outprocess_module.c Func:Outprocess_Destroy Line:955 unable to send destroy control message [0x7fb89bd6afc0], continuing with module destroy
username@machine_name:~/kiran/azure-iot-gateway-sdk/build$
I will need to investigate, and I will get back to you.
I'm still looking into this... I'm setting up a repro environment. I've also been looking into #320 at the same time to see if there is some sort of relationship.
@darobs I've added you to this issue for a morning brainstorming session about the best resolution to this behavior. @kiranpradeep I have confirmed there is no relationship to #320 and should have this fixed before long.
@kiranpradeep After researching the behavior and discussion with the team, we have determined the behavior you are experiencing is by design.
When activation.type: "launch"
is specified, it is assumed the gateway has full control of the life cycle of the module, exactly as if the module was running in the same process. Signaling an out-of-process module with an activation type of launch, is treated the same as if an in-process module received a signal or encountered an error (i.e. bad memory access). When activation.type: "none"
is specified, then you are assumed to be in control of the life-cycle of the out-of-process module and you can signal, kill, exit the program as you wish.
I believe the real problem is the unhelpful error message you are receiving and I have created an issue #339 to track it.
We acknowledge there is a lot of room for richer functionality in this area, and we would be interested to hear your specific use case and needs to see if we can model your feedback into a future enhancement.
@zefields Thanks. Is there any advantage in decision that activation.type: "launch" had to match an in-proc module behavior - Why not let the user decide(param/callback) to abort or continue in case an out-of-proc module fails?
Usecase: I had multiple data acquisition modules( GPS, BLE etc ) on gateway. I didn't wanted a crash in one of the modules to bring down all of the modules and so choose to run as out-of-proc. Felt it would be clean, if a single gateway/controller, started all data acquisition modules. May be I am not using "launch" for the functionality, it was intended to be.
You are the consumer, so you ARE definitely using it correctly! 😄 In fact, you have identified a use case that we want to support, but we were having difficulty deciding how to expose this functionality to users. Once you start exposing this functionality it seems to cascade into an enormous set of parameters to cover all use cases.
I understand your scenario, you did a great job of describing it. However, I have a couple clarifying questions to help us understand your needs. In this scenario, how would your gateway be impacted if a module dropped and you chose to continue and not to abort? Also, if you were notified that an out-of-proc module had died and you elected to have the gateway continue, what would your next steps be? Given your example, how does a gateway carry-on without a module? How do you respond to this information? Would you want to restart the module? What tools to you need from us to make your gateway achieve its purpose/goal?
@zafields
@zafields On a second thought, the request for notification on out-of-proc module death( point 2 ), was a request specific for my application needs. I now think, iot-edge is not a process monitoring system and so notification is not a responsibility of iot-edge. iot-edge already gives us a way to talk between ourselves(publish/receive), and with that we (lib users), could build our own mechanisms to see who is alive or dead.
But, what we expect is 1) Don't kill all of us, because one of was bad. Let us live, and decide for ourselves to do what we like. If you shut us down, on the very first out-of-proc module death, we cannot do any thing. 2) "activation-type": "launch" should launch all good out-of-proc modules, ignoring any one who cannot load themselves in "grace.period.ms" time. No restarting. No monitoring. Ignore dead or not responding modules.
@kiranpradeep Great insight! Let me run it around the yard and get back to you. Cheers!
On Ubuntu 14.04 LTS, samples/proxy_sample application will crash if manually kill( kill -9 pid ) the launched process(proxy_sample_remote). This happens only if we had used launch options(activation.type: launch).
If we try with activation.type:none and kill the _proxy_sampleremote process, the gateway process(proxy_sample) continues with out impact.