Running multiple instances of OLSRD??? to handle different latency cutoffs, on RF vs ISP hop links.

OLSR / olsrd

OLSR.org main repository - olsrd v1 - maintained by Freifunk Berlin

Other

84 stars 65 forks source link

Running multiple instances of OLSRD??? to handle different latency cutoffs, on RF vs ISP hop links. #80

Closed mathisono closed 2 years ago

mathisono commented 4 years ago

I would like to address the OLSRD Flooding events, recently in SoCali and SF Mesh network. Possible cause is still being studied. A few Ideas: Latency lag between tunnel servers connecting over Comcast Network. Network outages events / Network Saturation in Disaster events. TC messages floods (across tunnels --->Rf Hubs nodes) cause nodes to crash. RF topography adjacent to the node is slightly affected nodes don't crash. Networks with topography that relies on tunnel node links that do pass thru an ISP, or Cellular system experience OLSRD over loads on its systems memory. Systems may stall and or crashes on the node running a tunnel. This doesn't effect the Raw RF <---> RF nodes.

Q: Might running multiple instances of OLSRD for each interface, each with a different timing metric config help limit the overload? Or Is it time to migrate to OLSRD2??? Then tune it?

HRogge commented 4 years ago

I don't think multiple Olsr instances would help, because they would not route between the different interfaces anymore.

I am not sure I get the problem you are experiencing.

HRogge commented 4 years ago

Is this issue still relevant to you?

mathisono commented 4 years ago

Yes...

The RF side of the mesh using OLSRD is fine. There there are a few occurrences of "mesh storms" TC messages overload the tunnel servers on the mesh. It could be TC messages experiencing Latency/dropped packets coming thru tunnel with delay and or multiple responses coming back across different routes, overload OLSRD. Dose this sound like it could happen?

Why don't you think that multiple Olsr instances wouldn't not help to control flooding from an interface attached to an OLSRD instance? Could you elaborate?

Do you think this might be a short fall for V1, or something that could be solved with using v2 in our distribution.

thanks

HRogge commented 4 years ago

Are you sure it is even olsrd that is overloaded and not the input queue of the tunnel server? handling TC-duplicates (coming from different sources) is trivial for olsrd.

Multiple OLSR instances would mean that no routing would be done between the interfaces, only without the interfaces. Unless you have some kind of "point 2 point" tunnel between the two olsrd instances".

Still, I don't think multiple of these instances will help at all... do you see CPU load spikes for olsrd that suggest "something is lost" ?

I don't know if olsrv2 will solve your issues, because the picture of the source of the issues is not clear.

storchi commented 4 years ago

hi, mathisono, can you say some words about your topologie? Is there any non olsrd connection between the tunnel server? We in Weimar have layer 2 tinc connections between the vpn servers and my gut feeling says this could be a problem. The issues are to rarely to figure out layer 2 switching is the reason. christian

mathiashro commented 2 years ago

Hello @mathisono, may we close this case or do you have any more insights?