Ack received for block <Z>, but we were expecting an ack for block <X>.

btb commented 1 year ago

I'm using ADTPro over audio, and having a hard time receiving disks. It seems like any time there is a single failed ACK, the whole transfer will be aborted.

Looking at the trace, everything goes fine, until some random point, we are sending blocks, in this case beginning with 007C:

10/18/22, 12:55:50 PM CommsThread.sendDiskWide() sending packet starting at block: 124 with a block count (BAOCNT) of: 4
10/18/22, 12:55:50 PM CommsThread.sendPacketWide() entry; block to send: 007C
10/18/22, 12:55:50 PM CommsThread.sendPacketWide() looping until successful to send block: 007C
10/18/22, 12:55:50 PM AudioTransport.pushBuffer() entry, pushing 1930 bytes.
10/18/22, 12:55:50 PM AudioTransport.pushBuffer() pushing data:
C1 00 08 D3 1A 7C 00 03 40 40 80 7E 40 FF 30 52 B6 49 AF 1A D6 80 E0 00 11 40 10 00 14 20 40 C0 
...
F2 0E F1 02 03 FB 00 00 16 52 
10/18/22, 12:55:50 PM BytesToWav.encode() entry, encoding 1930 bytes.
10/18/22, 12:55:50 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:55:50 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:55:50 PM PlaybackThread.play() entry.
10/18/22, 12:55:50 PM PlaybackThread.play() payload size: 525952
PlaybackThread.play() Bytes written: 5259 136734 268209 399684
10/18/22, 12:56:03 PM PlaybackThread.play() Done playing.
10/18/22, 12:56:03 PM AudioTransport.pushBuffer() exit.
10/18/22, 12:56:03 PM CommsThread.sendPacketWide() calculated CRC: 21014

Then, instead of the client ACK, we get what looks like garbage, so the server re-sends those blocks:

10/18/22, 12:56:03 PM CommsThread.pullEnvelopeWide() entry.
10/18/22, 12:56:03 PM AudioTransport.pullBuffer() pulled data:
04 00 64 48 20 00 40 
10/18/22, 12:56:03 PM AudioTransport.pullBuffer() exit; _inPacketLen = 7
10/18/22, 12:56:18 PM CommsThread.pullEnvelopeWide() aborting due to timeout.
10/18/22, 12:56:18 PM CommsThread.pullEnvelopeWide() exit.
10/18/22, 12:56:18 PM CommsThread.sendPacketWide() Well, ack envelope was unexpected: 00
10/18/22, 12:56:18 PM CommsThread.sendPacketWide() ACK from client: 15
10/18/22, 12:56:18 PM CommsThread.sendPacketWide() didn't work; will retry #1.
10/18/22, 12:56:18 PM CommsThread.sendPacketWide() block: 124.
10/18/22, 12:56:18 PM CommsThread.sendPacketWide() backoff sleeping for 2 seconds (or 1 second, whichever is shorter).
10/18/22, 12:56:19 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:19 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:19 PM CommsThread.sendPacketWide() looping until successful to send block: 007C
10/18/22, 12:56:19 PM AudioTransport.pushBuffer() entry, pushing 1930 bytes.
10/18/22, 12:56:19 PM AudioTransport.pushBuffer() pushing data:
C1 00 08 D3 1A 7C 00 03 40 40 80 7E 40 FF 30 52 B6 49 AF 1A D6 80 E0 00 11 40 10 00 14 20 40 C0 
...
F2 0E F1 02 03 FB 00 00 16 52 
10/18/22, 12:56:19 PM BytesToWav.encode() entry, encoding 1930 bytes.
10/18/22, 12:56:19 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:19 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:19 PM PlaybackThread.play() entry.
10/18/22, 12:56:19 PM PlaybackThread.play() payload size: 525952
PlaybackThread.play() Bytes written: 5259 136734 268209 399684
10/18/22, 12:56:32 PM PlaybackThread.play() Done playing.
10/18/22, 12:56:32 PM AudioTransport.pushBuffer() exit.
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() calculated CRC: 21014

Now, we get an ACK from the client, but it seems to have no idea we're sending an old block. After a couple retries, the host aborts:

10/18/22, 12:56:32 PM CommsThread.pullEnvelopeWide() entry.
10/18/22, 12:56:32 PM AudioTransport.pullBuffer() pulled data:
C1 03 00 CB 09 06 84 00 82 40 BF 
10/18/22, 12:56:32 PM AudioTransport.pullBuffer() exit; _inPacketLen = 11
10/18/22, 12:56:32 PM Waiting for byteslo...
10/18/22, 12:56:32 PM  received byteslo: 03
10/18/22, 12:56:32 PM Waiting for byteshi...
10/18/22, 12:56:32 PM  received byteshi: 00
10/18/22, 12:56:32 PM Waiting for command...
10/18/22, 12:56:32 PM  received envelope command: CB
10/18/22, 12:56:32 PM Waiting for check byte...
10/18/22, 12:56:32 PM  received checkbyte: 09
10/18/22, 12:56:32 PM  calculated checkbyte: 09
10/18/22, 12:56:32 PM CommsThread.pullEnvelopeWide() exit.
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() entry.
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() payload byte [0]: 06
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() payload byte [1]: 84
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() payload byte [2]: 00
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() received   checkbyte: 82
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() calculated checkbyte: 82
10/18/22, 12:56:32 PM CommsThread.pullPayloadWide() checkbyte on payload matched.
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() Ack received for block 132, but we were expecting an ack for block 128.
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() ACK from client: 15
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() didn't work; will retry #2.
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() block: 124.
10/18/22, 12:56:32 PM CommsThread.sendPacketWide() backoff sleeping for 4 seconds (or 1 second, whichever is shorter).
10/18/22, 12:56:33 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:33 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:33 PM CommsThread.sendPacketWide() looping until successful to send block: 007C
10/18/22, 12:56:33 PM AudioTransport.pushBuffer() entry, pushing 1930 bytes.
10/18/22, 12:56:33 PM AudioTransport.pushBuffer() pushing data:
C1 00 08 D3 1A 7C 00 03 40 40 80 7E 40 FF 30 52 B6 49 AF 1A D6 80 E0 00 11 40 10 00 14 20 40 C0 
...
F2 0E F1 02 03 FB 00 00 16 52 
10/18/22, 12:56:33 PM BytesToWav.encode() entry, encoding 1930 bytes.
10/18/22, 12:56:33 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:33 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:33 PM PlaybackThread.play() entry.
10/18/22, 12:56:33 PM PlaybackThread.play() payload size: 525952
PlaybackThread.play() Bytes written: 5259 136734 268209 399684
10/18/22, 12:56:45 PM PlaybackThread.play() Done playing.
10/18/22, 12:56:45 PM AudioTransport.pushBuffer() exit.
10/18/22, 12:56:45 PM CommsThread.sendPacketWide() calculated CRC: 21014
10/18/22, 12:56:45 PM CommsThread.pullEnvelopeWide() entry.
10/18/22, 12:56:46 PM AudioTransport.pullBuffer() pulled data:
C1 03 00 CB 09 06 88 00 8E 40 BF 
10/18/22, 12:56:46 PM AudioTransport.pullBuffer() exit; _inPacketLen = 11
10/18/22, 12:56:46 PM Waiting for byteslo...
10/18/22, 12:56:46 PM  received byteslo: 03
10/18/22, 12:56:46 PM Waiting for byteshi...
10/18/22, 12:56:46 PM  received byteshi: 00
10/18/22, 12:56:46 PM Waiting for command...
10/18/22, 12:56:46 PM  received envelope command: CB
10/18/22, 12:56:46 PM Waiting for check byte...
10/18/22, 12:56:46 PM  received checkbyte: 09
10/18/22, 12:56:46 PM  calculated checkbyte: 09
10/18/22, 12:56:46 PM CommsThread.pullEnvelopeWide() exit.
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() entry.
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() payload byte [0]: 06
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() payload byte [1]: 88
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() payload byte [2]: 00
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() received   checkbyte: 8E
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() calculated checkbyte: 8E
10/18/22, 12:56:46 PM CommsThread.pullPayloadWide() checkbyte on payload matched.
10/18/22, 12:56:46 PM CommsThread.sendPacketWide() Ack received for block 136, but we were expecting an ack for block 128.
10/18/22, 12:56:46 PM CommsThread.sendPacketWide() ACK from client: 15
10/18/22, 12:56:46 PM CommsThread.sendPacketWide() didn't work; will retry #3.
10/18/22, 12:56:46 PM CommsThread.sendPacketWide() block: 124.
10/18/22, 12:56:46 PM CommsThread.sendPacketWide() backoff sleeping for 6 seconds (or 1 second, whichever is shorter).
10/18/22, 12:56:47 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:47 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:47 PM CommsThread.sendPacketWide() exit, rc = false
10/18/22, 12:56:47 PM Image transfer aborted.
10/18/22, 12:56:47 PM AudioTransport.flushReceiveBuffer() entry.
10/18/22, 12:56:47 PM AudioTransport.flushReceiveBuffer() exit.
10/18/22, 12:56:47 PM AudioTransport.flushSendBuffer() entry.
10/18/22, 12:56:47 PM AudioTransport.flushSendBuffer() exit.
10/18/22, 12:56:47 PM CommsThread.sendDiskWide() exit.
10/18/22, 12:56:47 PM CommsThread.dispatchCommand() exit.
10/18/22, 12:56:47 PM CommsThread.commandLoop() Waiting for command from Apple.
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received data.
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received a byte: 40
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received unknown command: 40
10/18/22, 12:56:47 PM CommsThread.commandLoop() Waiting for command from Apple.
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received data.
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received a byte: BF
10/18/22, 12:56:47 PM CommsThread.commandLoop() Received unknown command: BF
10/18/22, 12:56:47 PM CommsThread.commandLoop() Waiting for command from Apple.

So it seems like the client is completely unaware there was ever a problem?

I'm very new to 6502 assembly, but I think maybe part of the problem is in RECVWIDE of client/src/prodos/audio/audproto.asm, - where the block number as sent by the host is put into HOSTBLX, but I think this is never used for actually controlling where the client is in the receive process.

david-schmidt commented 1 year ago

It's worth looking harder at the protocol reaction here - the idea is supposed to be that the client only advances to the next block when it knows the "current" one is successful. So it may be the case that it's not getting it right in this situation (it's been ages since I looked at it). I'm not in a position to dig into it now, so if you want to use this as an opportunity to sharpen your assembly skills, feel free to dive in!

btb commented 1 year ago

Yeah, I tried a few hacks by modifying the program in the machine language monitor, but didn't hit on anything that really worked. Basically as far as the client is concerned, it was successful, so there's no mechanism for back-tracking once the ACK is sent. Unfortunately the server never gets the message and treats it the same as a NACK

Eventually I worked around it by changing the server to treat the garbage/timeout the same as a successful ACK.

Very weird that my setup is doing this so consistently (bad ACK every hundred kilobytes or so). Could be a hardware problem with my Apple II+, maybe power supply going bad and the cassette output driver is the first thing to start dropping out?

Fortunate problem to have if you want to catch bugs like this I suppose!

david-schmidt commented 1 year ago

There are some interesting interactions with audio (feedback?) that maybe sets up some sort of waveform that confuses everyone, so we get into these failure modes based on some particular data patterns. Sometimes changing the block count (i.e. shifting down to one) will change the outcome, even if it makes it overall slower.

david-schmidt commented 1 year ago

Speaking of reproducibility - do you have an image you can attach that causes the problem? That would be nice to have for posterity.

btb commented 1 year ago

I think the one I was trying at the time of the trace was Oregon Trail from here: http://www.mecc.co/software/apple-ii/the-oregon-trail---a-157/ OT.zip

david-schmidt commented 1 year ago

Yikes, that site is scary-bad. Here are the images so no one else has to go there: testdisks.zip

david-schmidt commented 1 year ago

Another user has run into the same problem in the IP transport, so I started digging. It does look like there's a bug in the protocol implementation that is common to all, and it's that a NAK leaves us with an incremented block count even though it shouldn't. I've got a test client available if you want to give it a shot. ADTPRO-v.r.m.DSK.zip

Also, we're discussing this issue in slack if you want to join (https://apple2.gs/slack) and the channel is https://apple2infinitum.slack.com/archives/C048MF00ZQX .

ADTPro / adtpro

Ack received for block <Z>, but we were expecting an ack for block <X>. #162