PelionIoT / sal-stack-nanostack-slip

SLIP protocol support for Nanostack
Other
2 stars 8 forks source link

SLIP TX does not appear to ever be started #11

Open ryankurte opened 6 years ago

ryankurte commented 6 years ago

Hi there,

Sorry if i'm missing something, I've been digging for a few hours now and can't see where the SlipMACDriver actually initiates a transmission.

In practice this presents as [ERR ][slip]: Ran out of TX Buffers. and there does not appear to be any slip packet output over serial (also the serial send method is never called, validated in the debugger).

The transmission process as far as I can see:

  1. SlipMACDriver::slip_if_tx() is called to send a SLIP packet
    • copies the transmission data into one of the available buffers
    • binds txIrq() for the current instance (this is commented out with a TODO)
    • binds a callback for the current slip instance
  2. SlipMACDriver::txIrq() gets called on the completion of the underlying serial tx
    • loads a buffer if there is not currently one set to pCurSlipTxBuffer
    • outputs one character
    • returns the buffer and clears pCurSlipTxBuffer once it is empty

So once a transmission has been started, provided the underlying serial interrupt is being called (which also appears to be a problem on my platform, par for the embedded course). But in the first instance, or if there isn't a transmission already in flight it'll never begin another transmission?

My suggestion would be that txIrq() should be fired from the slip_if_tx() method after registering the callback, or alternately via a signal to the SLIP_IRQ_Thread, but I am unsure as to how this was intended to work.

Thanks for the help (again!),

Ryan

ryankurte commented 6 years ago

Attempted a fix here which seems to work for packet tx (board -> pc) with slattach, need to set up the rest of the networking suff to finish testing it.

TuomoHautamaki commented 6 years ago

Hi @ryankurte , we are deeply sorry there's no progress in this issue. We will start looking into this one soonest. We've also notice BR issue, may be dublicate.

TuomoHautamaki commented 6 years ago

Team is looking into this now but meanwhile you can try to revert latest commit from this repository. Sorry for the inconvenience. Official fix will be coming later.

ryankurte commented 6 years ago

Had a go at reverting to the previous commit but that also appears to be missing the tx start (and doesn't work), and earlier versions don't compile.

I suspect there is some cross over with the slip issue, I see packets leaving the sl0 interface when pinging etc. but the os tx counter does not increase, suggesting the outgoing packets aren't getting ack'd. Still working to understand how it all fits together.

Screenshot has /a lot/ going on, but, shows the above.

screen shot 2017-12-15 at 4 07 38 pm
terhei commented 6 years ago

Hi @ryankurte

I made a PR (https://github.com/ARMmbed/sal-stack-nanostack-slip/pull/12) to fix [ERR ][slip]: Ran out of TX Buffers. - error, and that is now merged to master branch. I used mbed access point to verify the functionality, and border router now gets the backbone address from Raspberry Pi.

Would you please try with the latest fix? If there still is any issue, please give us description of your setup so we can try to reproduce the issue with our team.

Tero

ryankurte commented 6 years ago

Hi @terhei, thanks for looking at this.

PR #12 doesn't work for me with slattach, the slip connector under Debian, I will have a go at running it with the mbed-access-point.

I still can't see where the tx is being started, though I might be missing some underlying functionality of the network stack (that I could re-investigate now the source is available).

My (working in at least one direction) changes to get data from device -> pc are here.

Current setup is border router running on the EFR32FG12_BRD4254A in my branch of mbed-os for EFR32 support with the flag patch applied.

Which is a bit of a nightmare to support. I also need to pull updates to basically every component at the moment, so it might be that one of the fixes in another component contributes to it all working.

ryankurte commented 6 years ago

Confirming that with my patch mbed-access-point receives BR -> AP packets, with the latest master and PR #12 applied the [ERR ][slip]: Ran out of TX Buffers. error occurs. AP -> BR packets do not seem to succeed ever.

Still to update everything.

urutva commented 6 years ago

Hi @ryankurte,

The tx function in slip driver (slip_if_tx) is internally called by the Nanostack when a packet needs to sent out. During slip driver initialization (Slip_Init), the slip_if_tx function is registered with Nanostack.

PR #12 enables the serial tx_irq which should fix the issue [ERR ][slip]: Ran out of TX Buffers.. If you are still seeing this issue then I suspect hardware flow control. Have you enabled/disabled hardware flow control in the border route? What baud rate are you using?

How is the border router running on EFR32FG12_BRD4254A connected to the pc?

Also, make sure that serial device on the PC is configured with same settings as that of the border router using stty (https://linux.die.net/man/1/stty).

ryankurte commented 6 years ago

@devran01 thanks for the response. I'm stuck on an mbed-os version prior to the nano stack source release waiting on a couple of issues, so it's a bit hard to see inside nanostack at the moment.

Unfortunately I don't have a currently supported board on which to run it, and PR #12 definitely doesn't fix it for this platform.

I have hw flow control turned off, and can interact with the device over the serial port fine. Baud is 115200 and it's connected via a USB serial cable (though I have also tried mbed-access-point with a modified /etc/init.d/network to change the baud rate).

I can see (and catch in the debugger) the call to SlipMACDriver::slip_if_tx from nano stack, which puts the data in the buffer to be sent. I can also see the transmit interrupt being bound (and have manually tested that the underlying serial interrupts are working).

What I can't see is where a transmission is actually started. Something needs to send the first byte which will then trigger SlipMACDriver::txIrq and thus SlipMACDriver::tx_one_byte to continue the transmission. I've looked through a few times and can't find anything that does this, and have confirmed with the debugger that these are never being called on this platform.

With my additions that create a tx event when SlipMACDriver::slip_if_tx and then manually start the transfer this IRQ chaining works and packets are sent (and received on the PC).

I still don't have a good grasp of SLIP, so I am probably just missing something. But, I don't understand how the serial transmission can occur without ever being started.

urutva commented 6 years ago

@ryankurte If you have manually validated that the serial interrupts are working then I think attach function is not working as expected. It should set the rx/tx ISR passed to it and also enable the rx/tx interrupts depending on the IrqType.

When Nanostack calls slip_if_tx to send a packet out, we set the tx ISR by calling attach function which will internally enable the TX serial interrupt. For K64F: https://github.com/ARMmbed/mbed-os/blob/8a870a66c0d3cbdeda4d73c22a9fe12e0a571007/drivers/SerialBase.cpp#L71

https://github.com/ARMmbed/mbed-os/blob/6cf0c8673be1d9c7b336d72610b75a22281324b6/targets/TARGET_Freescale/TARGET_MCUXpresso_MCUS/TARGET_MCU_K64F/serial_api.c#L193

When the serial tx buffer is empty, a serial tx interrupted is fired which will lead to calling the isr SlipMACDriver::txIrq to send the packet out on serial link.

Hope this helps.

ryankurte commented 6 years ago

Attach is working ok as far as I can tell. The TX interrupt will only fire then the buffer transitions from full -> empty, usually something needs to put the first byte in the buffer to initiate the transfer, and I can't see where that's happening.

I will note that the interrupt flag does not appear to be cleared when binding in the K64F code, AFAIK this may cause the interrupt handler to be called immediately following the NVIC_EnableIRQ call if there's already a pending interrupt flag from initialisation / previous sending, or some K64F related interrupt oddity.

The equivalent implementation for the SiliconLabs devices clears the pending interrupt flag prior to enabling the interrupt, which will never cause an interrupt to occur immediately after binding (and is the deterministic way of dealing with interrupt binding).

I don't have a K64F to check if that is what is occuring, but if this is it I would view that as a bug in the K64F implementation.

urutva commented 6 years ago

If attach() is working as expected then serial tx IRQ should be fired when the tx buffer becomes empty which leads to calling the ISR SlipMACDriver::txIrq. The ISR then calls SlipMACDriver::tx_one_byte which will process the data according to present SLIP state and send it out on serial port.

Adding @0xc0170 to comment on "clearing the pending interrupt in case of K64F".

urutva commented 6 years ago

Closing the issue due to inactivity. Please re-open if the issue is re-appears.

ryankurte commented 6 years ago

Issue has not yet disappeared. Sorry it was inactive, took some time off over Christmas.

urutva commented 6 years ago

No worries. Can you try to disable the clearing the pending interrupts for the SiliconLabs device and see if it is working?

It looks like that the SiliconLabs serial hardware is not firing the tx empty IRQ after the pending interrupt is cleared possibly because of an IRQ already triggered when the tx buffer became empty.

TuomoHautamaki commented 6 years ago

@ryankurte , any update on this?

ryankurte commented 6 years ago

@devran01 I don't think it makes any sense to not clear interrupts on init and once they've fired, otherwise the initialisation state is undefined.

@TuomoHautamaki I might still missing something, but still can't see how this ever works without at least starting a transfer (and there is appears to be no code to send if there is not already an interrupt pending). After that point, the ISR works fine (and my fork demonstrates that).

The only other thought I have had is that perhaps either rx AND tx IRQs should be prompting sending, but it's not clear to me that is the case.

Do you know if SLIP has been demonstrated working on any ARM platforms? afaik the nuclei 429 uses ethernet instead.

urutva commented 6 years ago

@ryankurte Can you please raise an issue on mbed-os repo (https://github.com/ARMmbed/mbed-os) describing the difference in behavior related to K64F and SiLabs implementation of the serial attach method.

We can update the SLIP driver depending on the outcome of the discussion.