[net/ee16] Added shared mem support and fixes to driver

With this PR the EtherExpress16 driver enters a completed state. Several serious bugs have been weeded out, shared memory support has been added and the code has been significantly cleaned up. The new version is fast, reliable and supports important bootopts configuration flags. 8bit bus support is still missing - a relatively easy addition when/if the need arises because the scaffolding is already in place.

Summary of key enhancements since the previous PR:

Full shared memory and pio support. If a valid (not zero) shared memory address is provided in /bootopts, shared memory is used, otherwise pio. With more practical experience it may be desirable to force pio-mode when the buffer is 16k, like other drivers do. However, at this point there is little evidence to the claim of instability of the 16k/shmem combination.
The performance difference between shmem/pio is minimal on fast machines, shmem has a 4% advantage on my 40MHz 386sx. Probably more on slower machines - shared memory becoming increasingly better.
The io address in /bootopts MUST match the card setting, other parameters are taken from bootopts if available, regardless of card settings. In verbose mode, the card settings are reported at boottime. The card's shared memory settings are read but currently ignored.
The card/driver runs at approx. 80k bytes per second (ftp), ktcp permitting. A real kludge, a 600ms delay, has been added to the readselect routine in the driver. The delay speeds up outgoing file transfers by 300% average by avoiding a read select_wait call. A similar method may speed up other drivers as well. Looking into whether ktcp may be adjusted to avoid this kludge is on the list. See discussion below.
Bootopts flags supported are (forced) 16k buffer size, verbose mode, cable type selection.
The driver allocates transmit buffers like this: ((bufsize in k)>>3)&7, which means 2@16k, 4@32/64k. The current implementation of ktcp does not enable practical use of more than 2 tx-buffers, so the latter case is a waste - for now. Total number of NIC packet buffers is 10/20/40 @ 16/32/64k.

The driver is (still) happily and silently overwrite unprocessed packets in the rx-queue under heavy load. Changing the code to handle this differently has been deemed not worthwhile this far - possibly because most testing has been done on a 32k buffer NIC (iow ample buffer space). The only case in which this issue may (!) become visible is when launching a flood ping with significant size packets while running a listing in an outgoing telnet connection. Flood pings and ftp transfers work well. AAMOF - ftp transfers into TLVC are just barely affected by a running flood ping, indicating that there is more performance to be gained from tuning/optimizing ktcp.

Which brings us back to the

TCP speed kludge

This 'trick' was accidentally discovered when removing printks (actually kputchars) turned out to kill the performance of outgoing FTPs completely, from 70+k to ~25k. Keeping the kputchars in lasttxstatus(), which is part of tx interrupt processing, brought the performance back. The kputchars were eventually replaced by udelay() calls and moved to the _select function where quite a bit of experimenting lead to what seems to be a reasonable delay value.

This value will be different on a different speed machine, so - while in use - it should be calculated using a speed index, like the machine's BOGOMIPS value - which currently does not exist, but may be coming.

As to why this delay helps: What seems to be the case is that in the course of pushing packets and ticking off incoming ACKs, the delay is just enough to avoid a select_wait()/wake_up() cycle when receiving the ACK immediately following a transmitted data packet. The delay optimizes the rhythm of the exchange so to speak. It is not obvious that there is an easy fix for this in ktcp, but given the size of the improvement, it seems worth a discussion.

Mellvik / TLVC

[net/ee16] Added shared mem support and fixes to driver #64

TCP speed kludge