[net] First cut of working driver for Intel EtherExpress 16

Mellvik commented 5 months ago

This PR introduces a 4th working Ethernet driver to TLVC - for the Intel EtherExpress16.

A rather interesting piece of hardware with a lot of capabilities and also complexities. The TLVC driver reuse significant parts of the eexp driver from Linux - a requirement given that there is no low level documentation available for the hardware, other than for the Intel 82586 chip. The latter is very useful however, documenting the lowest level, explaining the internal data structures and 'ethernet processor' workings in detail.

As is, the driver is working and stable but slow, not taking advantage of the significant on-chip buffering yet. As with other net interfaces it is enabled via menuconfig with defaults specified in ports.h, overridden by settings in /bootopts under the ee0= label.

Specifically, incoming telnet is not working at this point, everything else seems to be stable and robust. The telnet problem may not be driver related - indications point toward the select system call (again), possibly the telnetd itself (see https://github.com/ghaerr/elks/issues/1048).

Speedwise this preliminary release checks in at 1.7ms ping round trip, 17 kbps incoming data, 50kbps outgoing, both via FTP. When tuned, the speed could exceed 100k in both directions (40MHz 386SX).

Netstat from a test run, ee0 (ee16):

tlvc17# netstat
----- Received ---------  ----- Sent -------------
TCP Packets        76742  TCP Packets        78041
TCP Dropped           15  TCP Retransmits        0
TCP Bad Checksum       0  TCP Retrans Memory   496
IP Packets         95332  IP Packets         96410
IP Bad Checksum        0  IP Bad Headers         0
ICMP Packets       18369  ICMP Packets       18369
ETH Packets        95539  ETH Packets        96618
ARP Reqs Sent          1  ARP Replies Rcvd       1
ARP Reqs Rcvd        206  ARP Replies Sent     206
ARP Cache Adds         2

 No        State    RTT lport        raddress  rport
-----------------------------------------------------
  1  ESTABLISHED 1000ms  1038         0.0.0.0      2
  2  ESTABLISHED   62ms  1030   192.168.10.97  54068
  3  ESTABLISHED  250ms    21   192.168.10.97  54065
  4       LISTEN 1000ms    23         0.0.0.0      0
tlvc17#

Mellvik commented 5 months ago

Update: The incoming telnet problem turns out to be unrelated to the ee16 driver, something else is broken in the system.

Edit: ttyp* and ptyp* devices on the system had wrong (old) device numbers. Incoming telnet now works fine.

Mellvik commented 5 months ago

With this update, the ee16 driver is robust, reliable and stable. Still inefficient and somewhat bloated, but very much useable. The development/testing platform has been a 40MHz 386sx single board computer attached to an ISA bus. File transfer (ftp) speed currently clocks in at approx. 50k byte per second in and out (which isn't half bad considering… the ne2k card on the same machine runs at 29k outgoing, 65k incoming - a reminder to look into xmit speed in the ne2k driver).

A few problems remain in order to categorize the driver as finished - i.e. having TLVC quality on par with the other ethernet drivers.

Transmit speed is hampered because the CU, the part of the NIC which sends packets, must be stopped while a packet is moved into the nic buffer (more below). Thus the NIC transmit buffer chain remains unused (typically 4 packet size buffers on a 32k card). This is commonly known as the 'CU wedged' problem and usually appears only on 8bit systems (the card in 8Bit mode). More about this below.
Receive overruns are silently ignored.
When bombarded with large packets (ping flood, 1400 byte payload), we may get checksum errors and hung (outgoing) telnets.

Developer Notes

Background - the Command Unit The 82586 chip has two 'processing units', the RU, Receive and the CU, Command Unit. The RU handles incoming packets, se below. After being programmed at initialization it pretty much runs on its own. The CU does everything else: Receiving commands from the CPU (i.e. the driver) and acting on them. Most of the time these transactions are about acking interrupts and transmiting packets.

The CU is either suspended (commanded to stop), idle (awaiting the next command) or it runs a command queue, a linked list of commands in NIC memory. The typical (and quite efficient) situation is to put every xmit command into the command queue, each with a link (pointer) to the next command - and so on.

Obviously we will rarely be able to keep the CU continuously busy with transmits, and the remedy is nops, null commands that just loops until the CPU (driver) changes its next-link to point to something else, like the next XMIT command.

Normally this works fine. The CU jumps from a NOP-loop to an xmit command which sends it into a new NOP loop when finished, or possibly the next xmit if we're really busy.

This works until the next-link in a command somehow gets mangled. This is what's known as 'CU wedged': The link to the next command was wrong and the CU took off into nowhere, and must be restarted.

In 8 bit mode, according to comments in the Linux driver, this is the rule. What may be happening is that since updating the next-link in a command (16 bits wide) now requires two operations via the 8 bit bus, the CU - in a tight NOP loop - will use the new link before the 2nd half is in place. Thus the need to stop the CU between every transfer. This should not be a problem with a 16bit system, but in our case it is, most likely a synchronisation issue, currently under investigation. In the meanwhile, the stop-and-restart regime works.

Memory Mapping vs. PIO The driver is currently using PIO to access the NIC buffers - simply because that's what the Linux driver on which this driver is based, did. The comments in the driver sources indicate that memory mapping is less stable, in particular if the NIC buffer is small, and that it's another parameter (complication) the user needs to keep track of.

The stability argument may have some merit on 386 and higher systems running in virtual mode. In our much simpler (IA16) case, memory mapping is likely to be both stable and clean, and quite possibly faster than PIO. Will be investigated. For the record, the NetBSD driver supports both modes.

That said the PIO mechanisms offered by the card are quite capable. Access to NIC memory goes via a data-IO register, the NIC memory address to read or write is set via two other registers, one for read, one for write. For each access, the NIC-internal address is incremented automatically, so PIO via insw or outsw is fine - and quite fast.

This mechanism is clean, but requires synchronization if used by different parts of the driver. Not a problem in the Linux driver since data is being passed to and from OS buffers, and PIO is taken care of in the interrupt handler. Our case is more complicated because we're (at least in this version) passing data to/from the user process (which means ktcp) and doing data IO from 3 different 'locations'. Lean, very efficient but not very flexible - and with some obvious drawbacks.

Fortunately - and very useful - the NIC has a shadow memory read mechanism which allows NIC memory to be peeked into without using the main data transfer registers.

Memory and data structures The 82586 chips is quite sophisticated given its age, with its dual processing units and very flexible memory structures. It's also (in)famous for being hard to deal with for hardware designers and software engineers alike - all the way since the Sun2 to the early Pentium systems.

As is often the case, many of the features are just beyond the patience of the developers and remain unused. The Linux driver is a good example, it avoids any complexity if it can, such as taking advantage of the processor's ability to manage small, linked buffer segments in order to maximize buffer usage. Instead, Linux (and this driver) allocates the traditional max ethernet packet size buffers and ignore the wasted resource.

E.g. given the fact that TLVC - as configured (ktcp) - will restrict outgoing network packets to 512 bytes payload, which means we can easily either triple the number of xmit buffers in the NIC or allow more receive buffers. OTOH, it's not obvious that more xmit buffers will affect performance, so it may just not be worth the effort.

The NIC memory size, typically 32k bytes, but sometimes 16k, other times 64k, is determined at boot/initialization time - currently by brute force (writing and reading back), later by just reading the size out of the NIC eprom. The first 0x100 bytes are used for configuration and admin purposes, then an automatically determined number of xmit buffers (like 4 if 32k RAM), the rest is receive buffers. The data structures in memory (linked lists) are initialized at boot time, which is particularly important for the RU, operating on the receive buffers.

After initialization, the RU runs pretty much on its own - receiving packets according to the configuration and interrupting as required. Other than acking the interrupts and unloading the data, there isn't much for the CPU (and thus the driver) to do to keep receive going.

Transmit is a different story, refer to the chapter on the Command Unit above.

The NOP_REGIME The NOP_REGIME ifdefs introduce a different way of keeping the CU busy with NOPs in between outgoing packets. It does the same thing technically as the original Linux way, but in a more logical (that is, easy to understand) way. The 'original method' is to use 3 unused words in each XMIT command to store a NOP command which is being jumped to after the XMIT has completed. It works but it is confusing.

The alternate way, adopted from NetBSD, places one NOP command (3 words each) per XMIT buffer at the beginning of the NIC buffer area, before the XMIT 'memory section'. Each XMIT command links to its corresponding NOP which links to itself. When a new XMIT is ready to go, we (the driver) change the link field in the preceding packet's NOP to link into the new one and off we go.

This alternative consumes more memory (4x6 bytes), may be marginally safer because the NOPs are less in harms way if the driver should wander off while creating an XMIT packet. And it has the potential to help with the 8Bit CU wedged problem - to be explained at a later time. Most importantly though, it's easier to debug because it's easier to understand. As checked-in in this PR, the 'new' NOP_REGIME is not used.

Mellvik commented 4 months ago

With this update - a few minor fixes - the ee0 driver is at version 1, stable and moderately efficient, very useable.

The NIC on-board buffer is very underutilized and the packet transmit mechanism is in permanent stop/start mode (which means there is no real xmit buffering). It turns out that the driver needs substantial rework in order to better take advantage of the NIC's potential. Like other NICs, the design assumes that an interrupt will always be serviced completely, i.e. the NIC buffers emptied by the OS at time of interrupt or as soon as possible thereafter. TLVC does not have buffer space to handle this, and will instead attempt to use the NIC buffers as temporary storage. This has been at least partly possible with other supported NICs and is certainly possibly with this one too, but it does require quite a bit of work (and complicates the driver significantly).

That said, the potential is interesting. Theoretically, when taking advantage of the 82586 chip's data management functions and the NIC buffer, the NIC can easily buffer 80 small (<256b) packets before overrun. With shared memory and the 82586 working smoothly, this NIC should deliver in excess of 100kBps in both directions.

For performance, this version delivers (FTP) 10kBps outgoing, 44kBps incoming (file storage is solid state, system is 44MHz 386SX). Interestingly, when activating debug (NET_DEBUG = 2 or even 4), the outgoing speed increases significantly (to ca. 27kBps), and indication that we have severe timing issues. Activating DEBUG will do the opposite for receive speed, reading around 20kBps).

Finally, this version of the driver does not do overrun reporting, overruns (as in flood-pings) are silently ignored.

Work in progress for a next version:

Eliminate The NIC wedged problem - if possible.
Rewrite the receive part to better take advantage of available buffer space. This includes reliable overrun handling.
Improve reset code to ensure that close/reopen of the device always works, regardless of the state at close time.
Add memory mapped IO
Get 8bit mode tested and working
Let the Verbose mode flag control the verbosity of the boot messages (and some error messages)
Test forced interface type setting via /bootopts flags (defaults to TP).

As is, any ee0 bootopts setting will override EEPROM settings, consistent with other net drivers.

Mellvik / TLVC

[net] First cut of working driver for Intel EtherExpress 16 #60