Closed jia200x closed 4 years ago
For the most part I imagine that this makes implementation for some devices easier and as you have shown the handling of received packets faster. However, I don't like that this requires an extra buffer and thus potentially breaks the copy-once paradigm a network stack might want to ensure. Moreover, for link layers with smaller packet sizes this might be fine, but e.g. all Ethernet devices would require an extra 1500 B of RAM with that. This is IMHO a no-go, especially if it is not necessarily required.
Yeah, allocations are slow. So are context switches. Maybe a network stack should avoid those. ;)
This doesn't need to be changed in netdev. The (admittedly somewhat complicated) recv logic especially allows leaving the buffer handling to the stack. Somewhere up, gnrc should maybe use static input buffers instead of dynamic packet snips. Netdev and the drivers support both dynamic and static packet allocation. I'd rather not change the logic to force static buffers.
My impression here is that these problems are caused in great part by not being able to start I/O operations (i.e. SPI transfers) from interrupts.
If that was possible and if a fixed size buffer was already available at reception time, then:
What am I missing?
If that was possible and if a fixed size buffer was already available at reception time, then:
That is the catch you are missing ;-). We can't have a fixed size buffer, guarantee zero copy, and be able to handle multiple packets without massive memory constraints.
My impression here is that these problems are caused in great part by not being able to start I/O operations (i.e. SPI transfers) from interrupts.
Why? A context switch to userspace takes ~10us. That is the amount of time one spi byte takes on the wire at 100k bits/s. There's not much to gain by not switching to userspace before starting the transfer (which can then be synchronous or asynchronous).
Also, in an ideal world, all network devices are connected using something supporting asynchronous IO. Practically, which actually does? The answer is probably some devices on some of the RIO-supported platforms in some configurations.
The current architecture works with DMA/non-DMA UART/SPI/I2C/register based/posix read/write/whatever, there's no constraint on whether asynchronous IO is possible.
And, the used buffer doesn't need to be available a interrupt time.
Upon completion, the upper layer is notified (via a message).
For the record, messages (as in, core/msg) are unsuitable for ISR->thread communication because messages get (silently) dropped when the receiver thread is either not waiting or the message queue is full.
I was not concerned with switching times, but with overall complexity. If you look at both procedures described in the issue, you see several inversions of control. This logic is IMO quite hard to follow.
For the record, messages (as in, core/msg) are unsuitable for ISR->thread communication because messages get (silently) dropped when the receiver thread is either not waiting or the message queue is full.
It's not me who wrote that.
For the record, messages (as in, core/msg) are unsuitable for ISR->thread communication because messages get (silently) dropped when the receiver thread is either not waiting or the message queue is full.
That's what #9326 is for ;-).
I was not concerned with switching times, but with overall complexity. If you look at both procedures described in the issue, you see several inversions of control. This logic is IMO quite hard to follow.
Prove me wrong, but overall complexity will increase substantially as soon as portable asynchronous IO on ISR level is added.
Also, please come up with a logic that is "easier to follow" but a) does not prescribe the method of user space notification and b) does not prescribe the kind of memory allocation.
It's not me who wrote that.
Then I misundrstood your suggestion to use a message for notification.
Moreover, for link layers with smaller packet sizes this might be fine, but e.g. all Ethernet devices would require an extra 1500 B of RAM with that. This is IMHO a no-go, especially if it is not necessarily required.
Indeed for Ethernet is a different story. In fact, most OS's allocate memory for LLs with huge MTU. This is usually not the case for LLs like IEEE802.15.4 (in fact, some OSs receive the packet in stack variables and then directly call the phy_indication function). See for instance this and this
Yeah, allocations are slow. ;)
Yes, indeed, but the graphs doesn't show the effect of allocation because in both setups I'm calling gnrc_pktbuf_add
. The only difference is an extra rx_buf I defined in the second case, so there's a an extra memcpy (but still faster than calling netdev->driver->recv
twice).
I have the impression if I do all the MAC logic without allocation, the time difference would be even bigger (I might try to find some time to show some results there).
This doesn't need to be changed in netdev. The (admittedly somewhat complicated) recv logic especially allows leaving the buffer handling to the stack. Somewhere up, gnrc should maybe use static input buffers instead of dynamic packet snips.
I agree it's possible to do both with netdev.
(Sorry for the long explanation, but I need to present some context)
Slightly out of the scope of this issue, but while working for some months in PHY/MAC layer rework, I've noticed a generic network abstraction with a southbound API to network devices might not be the best idea:
A concrete example is the NETOPT_PRELOADING option. It was added for keeping a frame in the transceiver's framebuffer before calling tx_exec. This is a native operation of a IEEE802.15.4 transceiver, and a IEEEE802.15.4 PHY layer already knows how to "prepare" and "transmit" without needing to add this logic
Another example: transceivers are not aware of PAN ID, short address, long address, etc. Some devices (at86rf2xx) include some helpers to deal with common MAC layer operations (hw address filtering, etc), and have some functions to load the hw address. But still, the driver is not aware of the addresses. This means there's a specific MAC<->device interface for setting hw filter addresses, that won't be present in other PHY/MAC layers
Usually the convergence layer goes on top of the MAC. See how Linux "net_dev" diverges on top of mac80211 and the Ethernet driver (from Linux.com):
These lack of several interfaces between MAC, PHY and transceivers is one of the reasons why we have MAC/PHY components scattered everywhere (RF, netdev, gnrc_netif) and why we don't have more complex MAC layers (although we can send IEEE802.15.4 frame, we don't have a IEEE802.1.54 MAC layer with security, commisioning, coordinators, beacons, etc).
The netdev
interface can work OK with a southbound API to MAC layers (or directly on top of devices with very simple MAC/PHY). I drew this picture some time ago to show how the netdev interface can work on top of SoftMAC (software MAC) and FullMAC (MAC implemented in the device, e.g CA8210):
There are well defined and stable northbound and southbound API's for MAC/PHY. Check for instance https://github.com/RIOT-OS/RIOT/issues/11324. We are reinventing the wheel each if we try to run glue code there.
In practice, the API on top of the network device is not sufficient for properly sending packets, doing CSMA, etc. In order to send data, it's required to be on top of the MAC layer, not on top of the device. Since we use netdev
directly on top of the network device, we have to write part of the MAC layer each time we want to send data (see [gnrc_netif](https://github.com/RIOT-OS/RIOT/blob/master/sys/net/gnrc/netif/ieee802154/gnrc_netif_ieee802154.c#L232 and AT86RF2xx test driver). The semantic of "send", "recv", "get" and "set" is not the best for an API for a transceiver (but can be on top of a MAC layer).
Maybe it's not the smartest idea to open the discussion here, I can open an issue if you want.
Going back to the original discussion and the points mentioned above, it could be possible to have different receive procedures depending on the MAC layer (it makes a lot of sense to allocate 1500 byte for Ethernet, but maybe not 127 bytes for IEEE802.15.4).
Opinions?
The only difference is an extra rx_buf I defined in the second case, so there's a an extra memcpy (but still faster than calling
netdev->driver->recv
twice).
If calling the receive function twice is slower, it means that accessing the device twice (via SPI) is the bottleneck here. In order to mitigate, the "next packet size" could be cached, or L2 could just assume a maximum packet size (as it does for ethernet) and then save the first call.
@kaspar030 yes, caching the next packet size would save some time.
What do you mean with "L2 could just assume a maximum packet size (as it does for ethernet)"? Having a fixed static buffer there?
What do you mean with "L2 could just assume a maximum packet size (as it does for ethernet)"? Having a fixed static buffer there?
IIUC you've shown that the actual allocation is not as bad as the extra "recv()" call, as the latter touches the device over SPI. For netdev_tap (sorry, not ethernet in general), as the size of a packet cannot be figured out in advance, the GNRC always allocates a whole linklayer MTU (1500b), retrieves the packet, then shrinks the allocation to what has actually been used. GNRC could do the same for e.g., 802.15.4: always allocate 127b, then shrink the allocation in pktbuf after the packet has been read (and thus the size is known).
This may lead to a virtual packet loss because (even for a short while) you are allocating more memory than actually needed, so if the whole MTU does not fit into the packet buffer (but the actual packet would) we can't do anything but abort the receive even though it might have been possible.
we can't do anything but abort the receive
The "recv()" call determining the actual size could be a fallback option:
pktsnip_t *pkt = allocate(MTU);
if (!pkt) {
size_t pktsize = recv(dev, NULL, 0);
pkt = allocate(pktsize);
}
if (!pkt) die("pktbuf full");
If that still exists I would be fine with that :-).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.
@jia200x just to confirm this issue is not relevant anymore?
@fjmolinas the discussion of the netif rework is going in the direction of having specific HALs for each technology (as described in https://github.com/RIOT-OS/RIOT/issues/12469). So I would say it's fine to close this one
Description
While working on #11483, I started evaluating some alternative receive procedures that could help to improve the performance of transceivers and maintainability.
Our current receive procedure using the netdev interface is the following:
dev->driver->isr(dev)
from thread contextdev->event_callback(NETDEV_EVENT_RX_COMPLETE)
is called fromdev->driver->isr(dev)
functiondev->driver->recv(dev, NULL, 0)
dev->driver->recv(dev, buffer, pkt_size)
However, there are some disadvantages of this procedure:
dev->driver->recv
function is tricky (usually radios don't understand the concept of "dropping" a packet, so there are blocks and chains of if-else in order to meet the requirements ofbuf != NULL
,len != 0
, etc. Also, some extra care is needed in order to avoid corrupting the frame buffer when reading the first received byte (PHY layer, Physical Service Data Unit length) withdev->driver->recv(dev, NULL, 0)
.Proposed procedure
I changed the receive procedure to indicate the rx of a packet, and modified the
_recv
function to simply write to a frame buffer (a.k.arx_buf
,127 bytes for IEEE802.15.4). I also added a PHY Indication function as a callback for the upper layer when a packet is received and areceive
function:The new procedure looks like:
dev->driver->isr(dev)
from thread contextdev->event_callback(NETDEV_EVENT_RX_COMPLETE)
is called fromdev->driver->isr(dev)
function.dev->driver->recv()
function to write directly in a fixed size frame buffer (static or stack allocated), without the drop and pkt size logic. Save the pkt size in apsdu_len
variable.phy_indication(dev, rx_buf, psdu_len)
to indicate the upper layer a packet was received.Results
For both setups (current
dev->driver->recv
and PHY indication), I'm using an AT86RF233 radio, a logic analyzer and GPIO pins to measure time. All times include gnrc_pktbuf allocation, which can be reduced for the PHY indication if the MAC layer is processed directly in rx_buf)The logic for the measurement is:
dev->driver->isr(dev)
functionnetif->ops->recv()
For the PHY indication setup, I modified the
dev->driver->recv
function to (sort of) preserve the original behavior:Here is a graph comparing our current
dev->driver->recv
and thephy_indication
mechanism for packets with different PSDU size:This is the time overhead (%) of the current
dev->driver->recv()
procedure:Note there's data duplication in rx_buf and the packet buffer, but could be reduced if the MAC layer is processed from the frame buffer
I've seen this pattern at least on OpenThread, OpenWSN and Linux (local buffers in stack). Contiki also has a similar approach.
Useful links