durgadas311 / cpnet-z80

26 stars 6 forks source link

CP/Net data dropouts on windows. #18

Open jayacotton opened 3 years ago

jayacotton commented 3 years ago

I have not fully nailed down the symptoms. But if I have a youtube video running and attempt to copy a file from server to client I get data errors. C>pip c:=k:mac.com

DISK READ ERROR: =K:MAC.COM

C>pip c:=k:mac.com

NO FILE: =K:MAC.COM

C>pip c:=k:mac.com

NO FILE: =K:MAC.COM

INVALID FORMAT:

C>pip c:=k:mac.com

DISK READ ERROR: =K:MAC.COM

After stopping the youtube session, the copy works just fine. This should say, almost fine. See below for more details.

Again, not certain of the parameters here, lots of variables, but I think we should at least document an odd behavior like this.

durgadas311 commented 3 years ago

I assume this is a serial port connection, not the WizNet? There are some new parameters for CpnetSerialServer to fine-tune the timeouts it uses for receiving data, although I'm not sure which side is timing out. It could easily be that the Windows system is not responding fast enough and the CP/NET SNIOS is the one timing out.

jayacotton commented 3 years ago

The machine configuration is: z80 7.8mhz KIO chip RomWBW CF card RTC card MT011 card (under test) and rc-usb. The data port is rc-usb (serial over usb) to serial server.

I am assuming that, if the server is busy doing something else, then we get breaks in the data flow, and a timeout occurs.

jc

On Fri, Mar 26, 2021 at 12:19 PM Douglas Miller @.***> wrote:

I assume this is a serial port connection, not the WizNet? There are some new parameters for CpnetSerialServer to fine-tune the timeouts it uses for receiving data, although I'm not sure which side is timing out. It could easily be that the Windows system is not responding fast enough and the CP/NET SNIOS is the one timing out.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/durgadas311/cpnet-z80/issues/18#issuecomment-808457419, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIC7NQWKQOW5TWJRLOXMLY3TFTM3JANCNFSM4Z36HKDQ .

durgadas311 commented 3 years ago

We may need to implement a way to fine-tune timeouts in the SNIOS for serial ports. We could establish some well-known location for timeout values, say relative to the configuration table, so that a utility could adjust them as we experiment with new values.

jayacotton commented 3 years ago

I think that once we establish a connection, the timeout value can be raised to maybe a second. That would give plenty of time for window to lollygag before SNIOS times out.

Another thought, we could have the server send over timing cues based on some kind of priority guess.... kinda vague.

The problem on the z80 side is, I can get accurate timing down to 20ms. Less than that and no joy.

Really , we have no idea of the actual dropout time. I might be able to get an idea by watching the select line for the DLP-USB chip and looking for idle states during the data transfer process. But since the analyzer will be running on the same computer as the server, this might compound the bug.

Does the server have a mechanism to resend busted packets ? Maybe a resend on data dropout would work... That kinda buries the bug. Mmmm temporal packet serial numbers ick

jc

On Fri, Mar 26, 2021 at 1:14 PM Douglas Miller @.***> wrote:

We may need to implement a way to fine-tune timeouts in the SNIOS for serial ports. We could establish some well-known location for timeout values, say relative to the configuration table, so that a utility could adjust them as we experiment with new values.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/durgadas311/cpnet-z80/issues/18#issuecomment-808484037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIC7NQWQZ2EZAGIJ5LU2MDTTFTTI5ANCNFSM4Z36HKDQ .

durgadas311 commented 3 years ago

I think one of the problems is that, especially for things like streaming video, the system has little choice but to take all the cycles it needs. I'm not sure we can ever guarantee that the server will respond in a given time under those conditions. With the DRI protocol, CpnetSerialServer does retry. It more-or-less does the same as the SNIOS, but I've never done any in-depth analysis of the protocol to see how reliable or robust it is. There's also some odd code in the SNIOS receive message routine that might not behave the way it should. The DRI protocol is very chatty, which means there are more chances for things to time out. If the server could just load up a buffer and let the hardware blast it out the serial port, it might do better. As it is, the protocol requires a fair amount of CPU time for the interactions - by that I mean it requires that the program often get CPU time (it is not a particularly heavy user of CPU cycles). We're really talking about designing another protocol, which is certainly possible.

jayacotton commented 3 years ago

Not certain of the depth of the buffer on the windows machine, (i/o buffer), the USB chip has a large buffer for incoming data, but I don't recall just how big it is. (now looking for the info)... The chip is DLP-USB1232H

I'm using the 254 fifo transport. There is no practical baud rate for that. It says 480mbits on the data sheet. Look like • 384 byte FIFO Transmit buffer / 128 byte FIFO receive buffer for high data throughput for the fifo size.

I am certain that the USB control logic is throttling the data rate. Now wondering if I am botching the bits somehow.

In DRI mode, is there less data per transfer ? maybe I should move to that mode, also does the serial server support dri mode? I have set the flags but I don't see any difference. Maybe I need to go dri on both ends ?

jc

On Fri, Mar 26, 2021 at 7:59 PM Douglas Miller @.***> wrote:

I think one of the problems is that, especially for things like streaming video, the server has little choice but to take all the cycles it needs. I'm not sure can ever guarantee that the server will respond in any given time under those conditions. With the DRI protocol, CpnetSerialServer does retry. It more-or-less does the same as the SNIOS, but I've never done any in-depth analysis of the protocol to see how reliable or robust it is. There's also some odd code in the SNIOS receive message routine that might not behave the way it should. The DRI protocol is very chatty, which means there are more chances for things to time. If the server could just load up a buffer and let the hardware blast it out the serial port, it might do better. As it is, the protocol requires a fair amount of CPU time for the interactions - by that I mean it requires that the program often get CPU time (it is not a particularly heavy user of CPU cycles). We're really talking about designing another protocol, which is certainly possible.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/durgadas311/cpnet-z80/issues/18#issuecomment-808632471, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIC7NQUUPR3IYTZROVFWH2TTFVCY5ANCNFSM4Z36HKDQ .

durgadas311 commented 3 years ago

Oh, so you've flashed this device to run in the FT245 Asynchronous FIFO mode? The FTDI FT2232H datasheet talks about 4K buffer for each channel, each direction. If that is the case, there should never be any buffer issues for CP/NET (and you might even be able to use Z80 block I/O instructions).

CpnetSerialServer must be configured for whatever protocol was built-in to the SNIOS. I'm assuming that was the SIMPLE ASCII protocol?

The maximum CP/NET message size in SIMPLE ASCII is 526 bytes. Typical file I/O messages are 344 bytes. If the FTDI buffers are really 4K, you should never encounter a buffer-full situation doing the transfers. From what I understand, if the Rx buffer does fill the USB protocol will prevent the host from sending. Since CP/NET is strictly a request/response message protocol, there will never be any large bursts of data (between any two nodes). Since the serial connection is strictly one-to-one, there will be no other traffic (nodes) over the interface.

The SIMPLE ASCII SNIOS does use timeouts on receive, similar to the ones used for DRI protocol (long timeout on the first character). There are no retries, though. Also, there are no ACKs. What this means is that there are no "send errors", only receive errors. I suspect if you did a NETSTAT after a failure you'd see the status byte showing 12H, but that doesn't really give us any new information. We'd have to enable some debug on the CpnetSerialServer side to get more information. Either the SNIOS send is not getting received on the host, or the SNIOS is timing is giving up before the host can respond. I can't imagine how the send is failing to reach the host - even if the host is swamped the I/O should never be lost - just delayed. And the server never times out waiting for a message - it has nothing else to do.

Regarding throttling, this should not be like the USB-UART cases where there is an actual baud rate. The only throttling I can think of are the USB transfer rates and the ability of either end to process the message. I'm seeing talk on the web claiming USB 2.0 effective throughput around 40Mbps, so likely much faster than the Z80. It really comes down to the two ends being able to respond promptly.

One unfortunate reality with CP/NET is the way DRI designed the NDOS. It uses the same message buffer for the request and response, and is really bad at handling errors at that level (as seen by odd results when showing STAT.COM output on a severed network connection). This means that the SNIOS RCVMSG routine can't initiate a re-send on failure, which is ideally the way it should be handled. It really limits the protocols, I think.

Sorry for the novel.

jayacotton commented 3 years ago

I have been messing about with the USB cp/net version of serial for a few days. I find that its a bit unstable on the z80 at 7.8mhz. On the z180 at 18mhz its unusable.

Same data drop outs. They just happen way more often. This seems to point to a receive byte timeout issue. Since snios does not know about CPU clock speed, the timeout calculations are busted on the z180. Perhaps we should add a parameter for the clock speed in the config.lib file, or make it a build time define.

BTW: what is the correct (expected) timeout supposed to be ?

durgadas311 commented 3 years ago

Really, the criteria for the timeout values is "long enough". It is going to depend on a lot of factors, not least of which is the CPU speed. The timeout values are configuration at build time, but I currently have them in the src/ser-dri/config.lib "NIC" file and they probably should be moved to each of the "HBA" files. Note, the timeout for the first receive character is already at the maximum (0000h = 65536), however we could define those values to be "relative" and each HBA is free to implement a timeout loop that takes into account CPU speed, etc.

Have you determined if the timeout you're getting is for the first receive of the response message, or is it somewhere in the middle of the received data for the response? If this is still the "youtube" video situation, it's really hard to tell how long the server code may get blocked from running. This may not be a situation we can fix.

I don't know how Linux and MAC do this, but I'm not surprised that Microsoft was make streaming video the top priority. They would view "glitchless" watching of programs to be some sort of requirement. They have not always shown the best judgment or innovation in dealing with things like this, rather they just use the bug hammer and could be denying CPU to all/most other programs for many minutes, even the duration of the video. Windows does have priority schemes, but I don't know how to use them. It may be possible to raise the priority of the server program, and possibly mitigate the losses.

durgadas311 commented 3 years ago

While I'm not sure we can ever fix a server that is being preempted by things like streaming video, we can make the client SNIOS timeouts more adaptable to CPU speed. since there is no universal "real time" clock (not necessarily talking about RTC "time of day" chips) we can't automate this. But, we could add a "timeout factor" variable to the HBA config.lib files, and calibrate that to 4MHz (i.e. "1" is for 4MHz Z80s). So, an 8MHz system would use "2", 16MHz would be "4", etc. We just have to "fudge" that factor for odd frequencies like 18.432MHz and 7.8MHz. But, it should be "close enough". It does require that the timeout loop burn another register for this factor, but I think it is doable.