GENIVI / rvi_core

Specify, design, plan and build a reference implementation of the open source infrastructure that drives next generation's connected vehicle services.
Mozilla Public License 2.0
64 stars 35 forks source link

Deadlock. #16

Open magnusfeuer opened 9 years ago

magnusfeuer commented 9 years ago

Under load, the RVI deadlocks in several instances when Component A calls Component B, while Component B calls Component A.

An example is service_edge_rpc's handle_remote_message(), called by protocol_rpc, which can be blocked if, at the same time the service_edge_rpc is currently processing a handle_local_message gen_server call which is indirectly calling protocol. The to call chains will, in this event, be blocked.

The solution is to replace synchronous calls (gen_server:call()), with asynchronous notifications (gen_server:cast()) that do not wait for a return value before continuing operations.

WIll be fixed in the gen_server_fix feature branch and 0.3.2

magnusfeuer commented 9 years ago

Another symptom of this bug is exhaust of file descriptors:

This eventually leads to: 14:09:08.197 [info] data_link_bert:receive_data(): Failed to send component request: {error,emfile} ... 14:09:08.197 [error] gen_server <0.4978.0> terminated with reason: emfile ... 14:09:08.200 [error] CRASH REPORT Process <0.4978.0> with 0 neighbours crashed with reason: maximum number of file descriptors exhausted, check ulimit -n

The reason is that the components are trying to make a JSON-RPC call to each other, but end up in the deadlock described above. Each waiting JSON-RPC call consumes one file descriptor out of the maximum 1024 allowed. Under load, the descriptors are all consumed.

magnusfeuer commented 9 years ago

Deadlock can be recreated through the 'tc' command provided by iproute2.

On the backend server (rvi-test1.nginfotpdx.net - 38.129.64.31). Issue a tc command that will introduce a 20-500ms delay with a 25% probability:

tc qdisc add dev eth0 root netem delay 500ms 20ms 25%

Check out branch 0.3.1 on rvi-test1:

ssh -p1066 rvi@rvi-test1.nginfotpdx.net
cd rvi
git pull origin 0.3.1
git checkout 0.3.1
make clean
rm -rf backend
make 
./scripts/setup_rvi_node.sh -d -n backend -c ~/rvi_backend_0_3_x.config
./scripts/rvi_node.sh -n backend

Start the mobile HVAC interface, and make sure it connects to rvi-test1.nginfotpdx.net:8808/websession, by checking its js/main.js file for its rvi.connect statement.

Install RVI 0.3.1 RPM on an IVI box.

Edit /opt/rvi-0.3.1/sys.config and set the static node entry to look like this:

       {static_nodes,[{"jlr.com/backend/","38.129.64.31:8807"}]},

Edit the node_service_prefix entry to look like this:

       {node_service_prefix,"jlr.com/vin/mfeuer"},

(Replace mfeuer with a suitable unique string)

Reboot the RVI box.

Launch mobile HVAC interface.

Drag the left temperature sensor on the mobile HVAC interface quickly up and down for 10 seconds.

The RVI node on rvi-test1.nginfotpdx.net will freeze with timeouts.

amcgee7 commented 9 years ago

A quick fix to this would be to pace our requests. For example the slider should only update once a second.

Art

On 24 March 2015 at 16:18, Magnus Feuer notifications@github.com wrote:

Deadlock can be recreated through the 'tc' command provided by iproute2.

On the backend server (rvi-test1.nginfotpdx.net - 38.129.64.31). Issue a tc command that will introduce a 20-500ms delay with a 25% probability:

tc qdisc add dev eth0 root netem delay 500ms 20ms 25%

Check out branch 0.3.1 on rvi-test1:

ssh -p1066 rvi@rvi-test1.nginfotpdx.net cd rvi git pull origin 0.3.1 git checkout 0.3.1 make clean rm -rf backend make ./scripts/setup_rvi_node.sh -d -n backend -c ~/rvi_backend_0_3_x.config ./scripts/rvi_node.sh -n backend

Start the mobile HVAC interface, and make sure it connects to rvi-test1.nginfotpdx.net:8808/websession, by checking its js/main.js file for its rvi.connect statement.

Install RVI 0.3.1 RPM on an IVI box.

Edit /opt/rvi-0.3.1/sys.config and set the static node entry to look like this:

   {static_nodes,[{"jlr.com/backend/","38.129.64.31:8807"}]},

Edit the node_service_prefix entry to look like this:

   {node_service_prefix,"jlr.com/vin/mfeuer"},

(Replace mfeuer with a suitable unique string)

Reboot the RVI box.

Launch mobile HVAC interface.

Drag the left temperature sensor on the mobile HVAC interface quickly up and down for 10 seconds.

The RVI node on rvi-test1.nginfotpdx.net will freeze with timeouts.

— Reply to this email directly or view it on GitHub https://github.com/PDXostc/rvi_core/issues/16#issuecomment-85736276.

Art McGee Infotainment Engineer

Jaguar Land Rover North America, LLC 1419 NW 14th Ave, Portland, Oregon, 97209 JaguarUSA.com http://www.jaguarusa.com/index.html | LandRoverUSA.com http://www.landrover.com/us/en/lr/