Avnu / OpenAvnu

OpenAvnu - an Avnu sponsored repository for Time Sensitive Network (TSN and AVB) technology
470 stars 290 forks source link

mrpd - M_RP messages out of order #322

Closed davidcemin closed 8 years ago

davidcemin commented 8 years ago

Im sending from a client application ( similar to whats done on one of our examples) MVRP and MSRP messages (using the V++ and S+L respectively). Mrpd receives the messages on the same order I sent them, but these messages go to the wire in a different order, as you can see below (under FULL DUMP) on the logs I got from mrpd.

1) MRPD 945.595265 CMD:V++:I=0123 from CLNT 54214 2) MRPD 945.595374 [008] CLT MSG 54214:VNE 0123 R=000000000000 VN/MT 3) MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 4) MRPD 945.595583 [009] CLT MSG 54214:SNE L:D=2,S=00010203001a0101 R=000000000000

The messages go out as in: 1) MRPD 945.686578 MSRP send PDU MSRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f
00: 01 80 c2 00 00 0e e2 d7 ed 58 8a 5e 22 ea 00 03
01: 08 00 0e 00 01 00 01 02 03 00 1a 01 01 00 80 00
02: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

2) MRPD 945.695457 MVRP send PDU 26 MVRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f
00: 01 80 c2 00 00 21 e2 d7 ed 58 8a 5e 88 f5 00 01
01: 02 00 01 01 23 00 00 00 00 00 00 00 00 00 00 00

This is happening because, on msrp_event when the MRP_EVENT_NEW command is received, the timer is already running. On MVRP the timer wasnt, so it was started, as you can see per messages below:

1) MRPD 945.595299 MVRP event New! MRPD 945.595314 MRP join timer start

2) MRPD 945.595466 == EVENT recv_ctl_msg == MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 MRPD 945.595542 MRP join timer running

So, as a consequence, the msrp timers expire first, and the msrp (listener declaration on this case) hits the wire before the mvrp declaration. One way to solve this would be restarting the mrp join timer for msrp upon receiving MRP_EVENT_NEW messages, but I dont know if that is the correct approach (and if that would cause other related problems)

Any ideas or suggestions?

FULL DUMP

MRPD 945.586559 == EVENT DONE == MRPD 945.595222 == EVENT recv_ctl_msg == MRPD 945.595265 CMD:V++:I=0123 from CLNT 54214 MRPD 945.595299 MVRP event New! MRPD 945.595314 MRP join timer start MRPD 945.595335 MVRP event New! MRPD 945.595374 [008] CLT MSG 54214:VNE 0123 R=000000000000 VN/MT MRPD 945.595434 == EVENT DONE == MRPD 945.595466 == EVENT recv_ctl_msg == MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 MRPD 945.595542 MRP join timer running MRPD 945.595583 [009] CLT MSG 54214:SNE L:D=2,S=00010203001a0101 R=000000000000 VN/MT MRPD 945.595648 == EVENT DONE == MRPD 945.686470 == EVENT MSRP join timer expires == MRPD 945.686503 MRP stop join timer MRPD 945.686527 MSRP Talker event tx!, VO/MT->IN MRPD 945.686578 MSRP send PDU MSRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f
00: 01 80 c2 00 00 0e e2 d7 ed 58 8a 5e 22 ea 00 03
01: 08 00 0e 00 01 00 01 02 03 00 1a 01 01 00 80 00
02: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
MRPD 945.686708 MRP join timer start MRPD 945.686725 == EVENT DONE == MRPD 945.695355 == EVENT MVRP join timer expires == MRPD 945.695375 MVRP event tx! MRPD 945.695387 MRP stop join timer MRPD 945.695405 MVRP event tx!, VN->AN/MT MRPD 945.695422 MVRP -> mvrp_emit_vidvectors() send NEW, pdu null LVA | NEW MRPD 945.695457 MVRP send PDU 26 MVRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f
00: 01 80 c2 00 00 21 e2 d7 ed 58 8a 5e 88 f5 00 01
01: 02 00 01 01 23 00 00 00 00 00 00 00 00 00 00 00
MRPD 945.695554 MRP join timer start MRPD 945.695570 == EVENT DONE == MRPD 945.786751 == EVENT MSRP join timer expires == MRPD 945.786779 MRP stop join timer MRPD 945.786799 MSRP Talker event tx!, VO/MT->IN

intel-ethernet commented 8 years ago

Hmm … the other alternative is for the thread to wait between the VLAN join request and the listener request … resetting one state machine in response to another state machines’ event seems dangerous (if that is what is being proposed).

ekm

From: David Cemin [mailto:notifications@github.com] Sent: Tuesday, November 17, 2015 7:50 AM To: AVnu/Open-AVB Open-AVB@noreply.github.com Subject: [Open-AVB] mrpd - M_RP messages out of order (#322)

Im sending from a client application ( similar to whats done on one of our examples) MVRP and MSRP messages (using the V++ and S+L respectively). Mrpd receives the messages on the same order I sent them, but these messages go to the wire in a different order, as you can see below (under FULL DUMP) on the logs I got from mrpd.

1) MRPD 945.595265 CMD:V++:I=0123 from CLNT 54214 2) MRPD 945.595374 [008] CLT MSG 54214:VNE 0123 R=000000000000 VN/MT 3) MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 4) MRPD 945.595583 [009] CLT MSG 54214:SNE L:D=2,S=00010203001a0101 R=000000000000

The messages go out as in: 1) MRPD 945.686578 MSRP send PDU MSRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f

00: 01 80 c2 00 00 0e e2 d7 ed 58 8a 5e 22 ea 00 03

01: 08 00 0e 00 01 00 01 02 03 00 1a 01 01 00 80 00

02: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

2) MRPD 945.695457 MVRP send PDU 26 MVRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f

00: 01 80 c2 00 00 21 e2 d7 ed 58 8a 5e 88 f5 00 01

01: 02 00 01 01 23 00 00 00 00 00 00 00 00 00 00 00

This is happening because, on msrp_event when the MRP_EVENT_NEW command is received, the timer is already running. On MVRP the timer wasnt, so it was started, as you can see per messages below:

1) MRPD 945.595299 MVRP event New! MRPD 945.595314 MRP join timer start

2) MRPD 945.595466 == EVENT recv_ctl_msg == MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 MRPD 945.595542 MRP join timer running

So, as a consequence, the msrp timers expire first, and the msrp (listener declaration on this case) hits the wire before the mvrp declaration. One way to solve this would be restarting the mrp join timer for msrp upon receiving MRP_EVENT_NEW messages, but I dont know if that is the correct approach (and if that would cause other related problems)

Any ideas or suggestions?

FULL DUMP

MRPD 945.586559 == EVENT DONE == MRPD 945.595222 == EVENT recv_ctl_msg == MRPD 945.595265 CMD:V++:I=0123 from CLNT 54214 MRPD 945.595299 MVRP event New! MRPD 945.595314 MRP join timer start MRPD 945.595335 MVRP event New! MRPD 945.595374 [008] CLT MSG 54214:VNE 0123 R=000000000000 VN/MT MRPD 945.595434 == EVENT DONE == MRPD 945.595466 == EVENT recv_ctl_msg == MRPD 945.595492 CMD:S+L:L=10203001a0101,D=2 from CLNT 54214 MRPD 945.595542 MRP join timer running MRPD 945.595583 [009] CLT MSG 54214:SNE L:D=2,S=00010203001a0101 R=000000000000 VN/MT MRPD 945.595648 == EVENT DONE == MRPD 945.686470 == EVENT MSRP join timer expires == MRPD 945.686503 MRP stop join timer MRPD 945.686527 MSRP Talker event tx!, VO/MT->IN MRPD 945.686578 MSRP send PDU MSRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f

00: 01 80 c2 00 00 0e e2 d7 ed 58 8a 5e 22 ea 00 03

01: 08 00 0e 00 01 00 01 02 03 00 1a 01 01 00 80 00

02: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

MRPD 945.686708 MRP join timer start MRPD 945.686725 == EVENT DONE == MRPD 945.695355 == EVENT MVRP join timer expires == MRPD 945.695375 MVRP event tx! MRPD 945.695387 MRP stop join timer MRPD 945.695405 MVRP event tx!, VN->AN/MT MRPD 945.695422 MVRP -> mvrp_emit_vidvectors() send NEW, pdu null LVA | NEW MRPD 945.695457 MVRP send PDU 26 MVRP DATA DUMP 0 1 2 3 4 5 6 7 8 9 a b c d e f

00: 01 80 c2 00 00 21 e2 d7 ed 58 8a 5e 88 f5 00 01

01: 02 00 01 01 23 00 00 00 00 00 00 00 00 00 00 00

MRPD 945.695554 MRP join timer start MRPD 945.695570 == EVENT DONE == MRPD 945.786751 == EVENT MSRP join timer expires == MRPD 945.786779 MRP stop join timer MRPD 945.786799 MSRP Talker event tx!, VO/MT->IN

— Reply to this email directly or view it on GitHubhttps://github.com/AVnu/Open-AVB/issues/322.

davidcemin commented 8 years ago

The suggestion I made on the first comment was just a test to prove the timer theory, and Im not willing to use that as a final solution.

Ok, im not gonna pretend that I fully understand this code because I don't :) Im running it in a linux machine, and as far as I can tell, it is a single threaded process.

on mrpd.c we essentially have 1) mrp, local_ctl, mmrp, mvrp and msrp init (on this order). The m_rp inits will initialize a socket for each one of the protocols. 2) create a pool of FDs with mmrp, mvrp, msrp fds and their timers 3) sit on a select waiting for a valid fd.

On the logs i posted above, I can see that the event that is triggering the msrp_send is the msrp join timer expiration (thats why I ended up just restarting the timer as a test to prove the timer theory). So i dont think i quite understood what you meant on your suggestion for the thread to wait between vlan and listener join requests. Could you explain it a little more please?

I could just delay the msrp message on the client application(and so no changes on mrpd), but that might end up violating the 802.1Q section 10.7.11 requirements.

andrew-elder commented 8 years ago

@davidcemin - the logic and log information you present describing the issue seem logical to me. I don't think we can restart the MSRP join timer when a NEW is received. What say we number S+L NEW's? The join timer could potentially be delayed a while. I checked the MVRP code and there is no NOTIFY string sent back to the client when the the JOIN timer completes and MRP_EVENT_TX is executed.

Upon first reading of my code, I couldn't see any special handling for the case you describe, but then I found a state called INSTREAM_STATE_WAITING_FOR_VLAN_REGISTRATION which gives away how I solved the issue!

intel-ethernet commented 8 years ago

I meant the client could join the domain, wait 500 msec, join the vlan, wait 500 mec, then join the listener stream.

ekm

From: David Cemin [mailto:notifications@github.com] Sent: Tuesday, November 17, 2015 8:42 AM To: AVnu/Open-AVB Open-AVB@noreply.github.com Cc: Mann, Eric eric.mann@intel.com Subject: Re: [Open-AVB] mrpd - M_RP messages out of order (#322)

The suggestion I made on the first comment was just a test to prove the timer theory, and Im not willing to use that as a final solution.

Ok, im not gonna pretend that I fully understand this code because I don't :) Im running it in a linux machine, and as far as I can tell, it is a single threaded process.

on mrpd.c we essentially have 1) mrp, local_ctl, mmrp, mvrp and msrp init (on this order). The m_rp inits will initialize a socket for each one of the protocols. 2) create a pool of FDs with mmrp, mvrp, msrp fds and their timers 3) sit on a select waiting for a valid fd.

On the logs i posted above, I can see that the event that is triggering the msrp_send is the msrp join timer expiration (thats why I ended up just restarting the timer as a test to prove the timer theory). So i dont think i quite understood what you meant on your suggestion for the thread to wait between vlan and listener join requests. Could you explain it a little more please?

I could just delay the msrp message on the client application(and so no changes on mrpd), but that might end up violating the 802.1Q section 10.7.11 requirements.

— Reply to this email directly or view it on GitHubhttps://github.com/AVnu/Open-AVB/issues/322#issuecomment-157426351.

davidcemin commented 8 years ago

@andrew-elder Yes i do not think it is safe to restart the join timer when the NEW is received. Again, this was just a test to prove my theory of timer expiration triggering the messages going to the wire. So, you solved this on the client application right? I found that 100ms is the minimum wait time that makes the meessages go out in the right order.

davidcemin commented 8 years ago

@intel-ethernet Thank you.

andrew-elder commented 8 years ago

@davidcemin - MRP timeouts are in mrp.h / * Join timer may only "fire" 3 times in 300 msec. Default timeout is 200 msec. \ Here we use 100 msec, or 300/3 msec. _/

define MRP_JOINTIMERVAL 100 / join timeout in msec _/

#define MRP_LVTIMER_VAL     1000    /_ leave timeout in msec _/
#define MRP_LVATIMER_VAL    10000   /_ leaveall timeout in msec _/
#define MRP_PERIODTIMER_VAL 1000    /_ periodic timeout in msec */
andrew-elder commented 8 years ago

Another possible longer term (more difficult) solution might be to have a single join timer that works across all MRP applications...

davidcemin commented 8 years ago

Thank you all, i think my initial question is already answered and im closing this issue.