23kbps is possible - Githubissues

raphlinus commented 6 years ago

Apologies if this issue is not the right place to post notice of the explorations I've made. I have a prototype of 23kbps transfer to an Apple IIe (work done at the Recurse Center), and think it might be practical to port this into c2t. However, I have concerns about repeatability. I would also be quite willing to push my prototype to a branch if people are interested in experimenting with it.

The basic plan for my work is to try to encode 4 bits of information per audio cycle. I encode 2 bits in the top half of the pulse, 2 in the bottom. Each pulse width is (44 + 27 * sym) µs where sym is 0 through 3, and the pulses are generated in a band-limited way. This image should illustrate the technique:

To decode, my code senses top and bottom half pulse widths separately, and each shifts 2 bits into the accumulator. Two of those cycles produce a byte. The full asm code is a shade over 128 bytes. Assuming an average symbol of 1.5, 2 bits per "pulse", 2 / (44 + 27 * 1.5) µs = ~23.6kbps. In testing, I've been transferring images to the hi-res buffer (8kB), and that takes around 2.5s, as expected.

I've prototyped my waveform generation code in Python, but it's not actually all that complicated, and I'd be happy to redo it in C. I'm particularly proud of the band-limited pulse generation approach, as I believe that it's much less sensitive to the sampling rate of the audio path (in fairness, I've only tested at 48KHz).

Now for the part I'm most uncertain about: in exploring the analog pathway with an oscilloscope, I find that the comparator output is biased pretty strongly to "1". This oscilloscope photo should show what I mean:

img_20170926_184209

I've studied the schematic and info about the 741 op-amp, and don't understand why this is happening - the effect is much larger than the DC offset suggested in the datasheet. I also suspect that the amount is not consistent. If the waveform requires calibration to a specific device, it makes it much less appealing.

Is it worth pursuing this? Would other Apple 2 owners be willing to run tests on their machines, and maybe poke with a scope? Would you be open to having this land in c2t?

datajerk commented 6 years ago

Sorry for the very late reply.

I'm not getting notifications from github (will try to fix after I write this). I would be more than happy to test for you. And yes I would be very open to having this land in c2t.

BTW about two weeks ago another broke the 14K barrier using Manchester encoding (binary phase-shift keying). I've asked him to jump in here as well. His project is here: https://github.com/xk/Turbo-Cassette-for-the-Apple-II.

Re: calibration. That could be an issue. However if there were a way to perform bidirectional communications then perhaps it could be automated?

There's another issue that I've had with anything other than full symmetrical cycles. Some players will just mess it up. The only BPS rate I've been able to have completely work on all Apple IIs and all players is 8000. My 9600 scheme using 1/2 cycles works on all Apple IIs, but fails with some audio players. The Manchester solution above I suspect will also fail with some players. For my 9600 BPS fails all 3 machines were manufactured in 2012 (2 macs, one lenovo).

I've asked the author of the Manchester code to jump in here as well.

raphlinus commented 6 years ago

Thanks for the encouraging reply. Maybe the best thing to do is open a pull request with my prototype (in as rough form as it is), and we can refine from there? For procedural reasons it's easier for me to patch existing projects than create a new repo, though the latter is possible.

Looking at your waveforms, I suspect one reason why the asymmetry might be a problem is that if you have a narrow + half pulse followed by a wide - half, with the same peak amplitude, this will create an overall - DC bias, which in turn will be eaten by the highpass filter in front of the op-amp. The net effect of that is to reduce the asymmetry. I deal with this by keeping the product of height and width constant, so that the + and - balance exactly, giving the HPF nothing to eat.

I'm sure the calibration could be automated, but there is something appealing about being able to record to a static sound file and expect playback to just work.

david-schmidt commented 6 years ago

I would be interested in 'scoping different models with different 741s from over the years. As we found with Egan's project, various Apple IIs seemed to behave differently. I have Apples, but not oscopes.

xk commented 6 years ago

Hi datajerk, here I am.

That's pretty cool! A minor drawback I see is that an Apple II can't generate a signal like that because it can't modulate the amplitude, so if that's a must I fear this serves only in the PC to Apple II direction?

All my code and findings are in the url datajerk has given and in some c.s.a2 threads. There's also a demo page here https://apple2.duckdns.org/turbodemo/ . From there you can go to the repository @ github and to the c.s.a2 thread.

I am very "interested in experimenting with it" buuut... where's your code @raphlinus ?

raphlinus commented 6 years ago

@xk Code coming soon. I'm a bit busy with other stuff right at the moment, and what I have needs some cleaning up.

Yes, part of the parameters I set is that the sender would be able to do as sophisticated waveform generation as possible to make the receiver happy. If you have just a pulse generator with constant amplitude, then avoiding DC bias is much harder.

I think you can do even higher speed transfer (60kbps) in the Apple -> PC direction using the same ideas, but the details would be different, ie, the signal wouldn't roundtrip.

Also, to address a question brought up in the forum, at this speed transfer seems to be quite reliable. HGR images can easily survive single bit errors, but I've used this to transfer single-load binary games and haven't seen an error.

datajerk commented 6 years ago

@raphlinus yes please open a PR, and please do it against the c2t-96h code and also make it optional with an option flag (preserving the existing known working methods).

c2t and c2t-96h are in a real messy state. I wrote most of c2t in about 4 days over the US TG 2011 holidays. I then later added the disk support. Some of the encoding methods didn't work on all Apple IIs as @david-schmidt can attest to. I have yet to remove that code. Years later while writing an article (request by a publication, but we could not agree on the terms (I wanted the article to be open source)) on c2t I discovered with minimal effort I could get 9600 for free with just a macro change. It works on all Apple IIs, but not all players. Because it was a quick hack I kept both c2t and c2t-96h around, but only c2t-96h got minor enhancements and patches that I never rolled back to c2t. I really need to merge and clean all this code up. Now with @xk and your new methods, I'd like to add them as well as optional. Including the possibility of bi-directional transfers. Bi-directional would mean that I would not need to emulate the decompression time and write time to estimate the amount of padding time. However I still think I need to keep the fire-an-forget methods since they work and work well and require no modification to the Apple II.

I need to write a test payload as well that any user can use and it will test each method and tell them what is expected to work.

@raphlinus I'd like to do as you suggest to avoid the HPF issue with my current 9600 bps method, can you please provide a bit more detail (I'm only a software guy). Is it really as simple as HxW? I think I could test this next weekend if its a 5 min patch. I actually have a failing machine to test with.

@xk re: Apple II cannot generate that signal. True but we could use 23k+.. down and 14k+ up, right?

Gents where were you in 2011? :-)

raphlinus commented 6 years ago

@datajerk Ok, there are two conflicting requirements there: getting code up quickly so that others can experiment and validate its reliability etc, and having the code as a patch against c2t-96h. No problem with that as an end state, but let me propose opening a PR with the prototype, then gradually merging the work into ct2-96h either as commits in that PR or in successive PR's.

It is as simple as HxW. Actually it's area under curve, but for the shapes we're talking about it amounts to the same thing. Of course, the waveform generation I'm doing is band-limited, so it's a smooth wiggly rectangular pulse, but that also checks out (band limitation can be modeled a as a linear time-invariant convolution kernel, so if there's no DC in the source, there's no DC in the band-limited version either).

Thinking about this a bit more, my work really divides into two halves:

Band-limited waveform generation.
An encoding that squeezes 4 bits out of each full cycle.

The first, I think, would be an improvement for all encodings. I believe the higher slope at zero-crossings (than a sine-derived shape) will drive a sharper signal through the 741 and multiplexer, which will especially help at lower volumes. It will also make it so you can produce almost exactly the same waveforms, with exactly the same timing at 44.1 and 48. This can be adapted to your existing encodings and would not require any changes on the 6502 side. I suggest that this is the first priority. Note that all my waveform generation code is currently in Python for ease of experimentation. See this Stack Overflow thread for more details of the BLEP approach.

One other point, I ran across this scope trace on the apple2 forum, and I'm encouraged that it matches what I've observed. (Note that the image above is sampled from the 741 output, while the linked trace appears to be sampled at the multiplexer input after a 12k resistor). Both traces show an approx 20us peak-to-peak slew (closely matching the 0.5V/us spec), and also show an approx 15us delay after the low-to-high zero crossing, but not the high-to-low. I have no real explanation for this asymmetrical behavior, and it doesn't show up in the circuit simulators I tried. Yet, if it's consistent from copy to copy of the Apple, we should be in good shape.

datajerk commented 6 years ago

@raphlinus yes, a standalone PR would be great too, to quickly get it out there for testing.

I'll try to change my code to adjust the amplitude with 1/2 cycles and see if that fixes the issues with older machines. I'll probably have time around the US TG holidays to test. Thanks.

raphlinus commented 6 years ago

I thought I'd update this issue with my latest experimentation and thinking. There are basically three challenges: polarity inversion, 6502 code size, and speed tuning.

Polarity inversion

Based on experimentation, it's clear polarity inversion is a major problem. Some audio outputs invert, some don't. It's unreasonable to expect users to know or care.

The existing c2t only senses one half of a pulse, so in theory should be insensitive to polarity inversion. That's only true though if the generated signal is symmetric, which is only true for some of the variants. I also wouldn't be surprised if there were some subtle issues. Certainly sensing only half of the pulse makes the entire system sensitive to the duty cycle of the comparator pipeline output, which is hard to calibrate.

What I'd like to do is make the 6502 code detect inverted polarity, then flip the sense loop if that's found (ie switch bmi and bpl instructions inside the sense loop). Having thought this through, I want to prepend [2 2 2 ... 2 0 2 0 2 0 2 0 2 4 2] inverted, then [2 2 2 .. 2 0 2 0 2 0 2 0 2] to actually start the data, and 5 to step. The sync waits for [0 2 0 2 0 2 0 2] then goes into the main data loop. For the first symbol of a byte in the data loop, any value 4 or above breaks it. So a value of 4 flips the sense loop, and a value of 5 triggers data end.

6502 code size

The existing c2t code is limited to 384 bytes (copied from the $0800 range to $BA80). My current prototype exceeds that, and doing the polarity inversion will make it worse. Here are some of the things that can be done:

Copy more than 384 bytes to high memory
Do the message printing from low memory, then copy; this eases constraints on the size of the message
Compute checksum on the fly while receiving, rather than a separate loop later
Simplify the sense loop

That last requires more detail. My current sense loop has a timing of 21 cycles, with a bunch of early exits to make the timing more precise. A simpler loop would have 9 cycle timing, and then you'd divide that by either 2 or 3 to decode the actual symbol. The simpler loop would only have one bmi and one bpl, so it's also simper to patch for the polarity inversion above.

Speed tuning

This is where things get a little complex. My current waveform generation code generates the same result as a perfect pulse train low-pass filtered to be band limited. This operation does not exactly preserve the zero crossings, rather it perturbs them. In my current prototype (with the 21 cycle sense loop) there's enough margin it can still reliably detect the signal. Trying out the 9 cycle loop, dividing by 2 (so 18) is not reliable. It's perfectly reliable with a sine wave input, which strongly suggests that it's the perturbation of zero crossings caused by the band-limiting filter. I can further test this hypothesis by using a DAC with a 192kHz sample rate (a Roland Quad Capture). Dividing by 3 (so 27 cycles) is reliable, as long as the base frequency is low enough. It looks like I can hit 16kbps (maybe 18) using this approach, without having to do any more work on waveform generation.

The perturbation of zero crossings should, at least in theory, be perfectly predictable, and it should be possible to compensate for them by changing the timing of the pulse train. This is, suffice it to say, not trivial, though I enjoy such challenges. My experiments so far that it should be possible to get back to the 20kbps speed of my current prototype (using the simpler 9 cycle loop so code size is better and it's easier to implement the polarity flipping), possibly 23 by pushing it more aggressively.

Given how much more complicated this approach is, my current thinking is to get it working at a speed around 16kbps, then try to do the fancier waveform generation as a followup.

This leads to several questions, for which I'd like input:

Is the auto polarity flipping important, or is a command line flag to set polarity manually acceptable?
Should we maximize raw speed before shipping, or do in stages as suggested above?
Should speed be parametrized from the command line? Lower speeds will always be more robust to variations like volume and non-flat frequency response.

I hope to have a bunch more time to play with this over the Thanksgiving and then Christmas holidays.

david-schmidt commented 6 years ago

For my use case - auto-ness is going to be paramount. Either it's going to need to auto-train, or I'm going to need a way to programmatically train it, because as you say - my users don't want to know or care.

Speed - slow at first is fine as is scaling up as it proves reliable. Fast and reckless just runs us into walls.

datajerk / c2t

23kbps is possible #4

Polarity inversion

6502 code size

Speed tuning