Guzunty / Pi

This repository contains resources to support the Guzunty Pi IO expansion board
110 stars 32 forks source link

Merge I2S streams #11

Open campbellsan opened 11 years ago

campbellsan commented 11 years ago

From Florian:

I want to multiplex the I2S interface, but I don't know if your CPLD is large and fast enough (I have no experience with CPLDs, only with large FPGAs and they had much more capabilities than I ever used).

The basic idea is this: There is a master clock of about 12.288 MHz. There are two bitstreams clocked with 3.072 MHz. They have to be merged word-wise into a bitstream clocked with 6.144 MHz.

campbellsan commented 11 years ago

I believe there are sufficient resources in the CPLD and it should be fast enough too.

Can you point me at the relevant I2S documentation? Never mind, I found it here:

https://www.sparkfun.com/datasheets/BreakoutBoards/I2SBUS.pdf

Thanks.

campbellsan commented 11 years ago

Ok, so I had a quick look at the specification. It does look like something the CPLD could cover. Florian, can you say a bit more about the nature of the input and output streams? Is the application merging two mono streams into one stereo stream? Something else?

campbellsan commented 11 years ago

Also which pins were you planning on using? (looks like P1-12 and P1-13 would be one of the streams?) Maybe external (non Pi) pins too?

koalo commented 11 years ago

For I2S multiplexing there is no documentation, because this is out of standard.

Standard I2S can only transmit two channels per direction (i.e. stereo). There are other standards (e.g. TDM) that can transmit more channels, but there is no support for them by the Raspberry Pi.

In my project I want to have in the best case four input channels and six output channels. There are not even many audio codecs (ICs) that support that much channels. My idea is connect multiple codecs to the CPLD. They will transmit their data via I2S and the CPLD should output this as a single I2S stream. For the hardware of the Raspberry Pi this seems like a stereo audio stream with a high sampling rate, but the software will then distribute the stereo signal into the multiple channels again.

Also which pins you were planning on using? (looks like P1-12 and P1-13 would be one of the streams?)

This is exactly my problem. The full I2S interface is only accessible via P5.....

campbellsan commented 11 years ago

Ah ha! I now realize I misread your original issue. You wrote I2S all along and I read I2C. Sorry about that.

Ok, now I understand. There seems more to the I2S story than just getting P5 hooked up? http://www.raspberrypi.org/phpBB3/viewtopic.php?f=7&t=2631 or is that discussion fully obsoleted by the addition of P5?

Can you say which pins on P5 you would use if it was possible? None of them are explicitly labeled I2S.

Is a short ribbon cable an option?

koalo commented 11 years ago

You wrote I2S all along and I read I2C. Sorry about that.

It took me some time, to understand that I2C != I2S, too. That name is very confusing, because these standards are very different (e.g. I2S is no bus). Furthermore, I just realized that there are I2C lines at the P5 header, too....

or is that discussion fully obsoleted by the addition of P5?

yes :-) the only thing that had to be done is to write a I2S driver, but as you can read here http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=8496 this driver is now ready to use :-)

Can you say which pins on P5 you would use if it was possible? None of them are explicitly labeled I2S.

They are labeled PCM, this is a generic term for I2S and related audio standards. P5-03 - PCM_CLK - bit clock, e.g. 3 MHz (depends on sampling rate and bit depth) P5-05 - PCM_DIN - data input P5-04 - PCM_FS - frame sync, e.g. 48 kHz (depends on sampling rate - i.e. IS the sampling rate) P5-06 - PCM_DOUT - data output

Is a short ribbon cable an option?

Maybe, but as these signals have a relatively high frequency this could introduce crosstalk problems. This is at least what I realized when I build my first I2S adapter.

koalo commented 11 years ago

I just realized, that I should have pointed you to the right post and not to the beginning of the thread. I am sorry! http://www.raspberrypi.org/phpBB3/viewtopic.php?p=340663#p340663

campbellsan commented 11 years ago

What sampling rate(s) and bit depth(s) were you hoping to support?

koalo commented 11 years ago

48 kHz and 16 bit. Others are optional. Why is this relevant for board design? ;-)

campbellsan commented 11 years ago

It's not relevant to the board design, but it is relevant to the design of the core (or were you planning on doing that yourself, in which case I'll stop being nosey :-) ).

Bit depth is particularly relevant, since according to my understanding the bits would have to be cached in hardware until ready for output. I assume 'merged word-wise' means alternating groups of 16 bits?

Is the two stream requirement an initial one hoping to build up to the 4 in, 6 out? Not clear on how 4 ins become 6 outs. Is this a routing thing? I'm also not yet 100% clear on what you see as being done in the CPLD and what should be done in the Raspberry Pi. One reading of your ultimate goal is that the CPLD reads 4 input channels into a single higher bit rate stream which is then sent to the RPi across the specified pins. Are you then looking to push output to the CPLD across the same pins (or rather the associated output side I2S pins) and that this should be broken up by the CPLD into 6 lower bit rate output streams?

I see where the 3.072 MHz figure comes from, BTW. Its 48kb x 16 bits x 4 channels, right?

Another question, are the stream clocks synchronized? I understand they are different rates, but can we assume they are phase locked?

4 x 16 bit channels is likely to be beyond the capacity of a 9572 device. 2 x 16 bit inputs is possible, I'm sure.

By the way, there is a larger device available, a 95144, which would likely be able to handle some or all of the extra demands. Please see issue #5.

koalo commented 11 years ago

It's not relevant to the board design, but it is relevant to the design of the core (or were you planning on doing that yourself, in which case I'll stop being nosey :-) ).

I would have tried myself to reactivate my VHDL and Verilog knowledge, but if you are interested in doing this - even better :-D

One reading of your ultimate goal is that......

What you wrote is exactly what is in my mind. Raspberry Pi -> I2S -> CPLD -> 6x 16 bit@48kHz AND 4x 16 bit@48kHz -> CPLD -> I2S -> Raspberry Pi

Bit depth is particularly relevant, since according to my understanding the bits would have to be cached in hardware until ready for output. I assume 'merged word-wise' means alternating groups of 16 bits?

You are right, that could be the crux. Although, not all of the 10x16 bit words have to be cached, because some audio codecs also accept this serialized word format which would only require to adjust the frame sync signal (I don't know who to explain this... Currently I don't know which codecs I will use, so it doesn't make much sense to go more into detail, but if you are interested: http://www.wolfsonmicro.com/documents/uploads/data_sheets/en/WM8580A.pdf Figure 17 on page 26).

Another question, are the stream clocks synchronized? I understand they are different rates, but can we assume they are phase locked?

Yes. Furthermore, as the CPLD should do the clocking (if this is possible), it can fulfil it's own requirements.

4 x 16 bit channels is likely to be beyond the capacity of a 9572 device. 2 x 16 bit inputs is possible, I'm sure.

Do you mean it is possible to cache 2x16 bit at a time, but not 4x16 bit?

By the way, there is a larger device available, a 95144, which would likely be able to handle some or all of the extra demands. Please see issue #5.

Sounds very nice - soldering SMD is no problem for me.

campbellsan commented 11 years ago

If you are interested in doing this - even better :-D

At the moment I'm working on support for driving stepper motors, but after that I'll be looking around for some other cores to add to our growing collection. Supporting audio streams seems to be of pretty general interest, so yes, I'm interested.

The CPLD should do the clocking (if this is possible)

It's certainly possible for the CPLD to manage dividing down the master clock, but the standard Guzunty has no means of generating its own clock. This can be provided on GPIO4, we should be able to figure out the divider settings to get the required 12.288MHz.

Do you mean it is possible to cache 2x16 bit at a time, but not 4x16 bit?

Yes. The 9572 has 72 macrocells, there is one SR flipflop in each macrocell, which is used for any hardware registers required by the design. So there is a hard limit of 72 bits of storage. As I mention on some of the core pages, being a RAM chip is not one of the CPLD's strengths. :-) While 64 bits is less than 72, any design tends to eat bits here and there such that the 64 bit requirement will likely be too much to fit.

Sounds very nice - soldering SMD is no problem for me.

Sure thing, I can get one of these boards to you if you send me a mailing address via private mail (you can reach me using the board name at gmail), or order a standard Guzunty and I'll throw one of these SMD PCBs into the package for you. Two things to note though, a) this alternate design obviously doesn't solve the J5 issue. We'd still need to think about that or we may have to create a new board design altogether and b) this board is a prototype. It is fully functional and indeed is in use by one other member of the community, but it doesn't have nice silk screening labeling all the pins and other such polishes.

koalo commented 11 years ago

This can be provided on GPIO4

Digital audio is very clock sensitive. At the moment I use PLLD with MASH 1 and it works, but this is not an optimal solution because of the jitter. Therefore, I am planning to use an external oscillator with 12.288 MHz and connect it to the CPLD.

As I mention on some of the core pages, being a RAM chip is not one of the CPLD's strengths. :-)

Maybe there is another solution: By shifting the frame sync and increasing the bit clock frequency it could be possible to provide the necessary data just when it is needed. In this case there is no need for the CPLD to store the data. Although, this depends on the codec (many, but not all, codecs can handle increased bit clock frequency).

I will search for good audio codecs and do some more tests next week. Another problem is how to connect the board with the audio codecs to the CPLD board. Maybe a ribbon cable with alternating ground and signal is ok, but currently I don't know....

I can get one of these boards to you

You have mail :-)

Greetings, Florian

mark222 commented 11 years ago

Hi, this is also of interest to me. In my application I want to read an ADAT (lightpipe) stream into the Pi via I2S. ADAT format is 8 channels of 24bit audio at 48K. Seems like this would fit into a single stereo I2S stream running at 192K. Software (not real time) will later divide the samples back out into 8 separate channels.

The ADAT format is complex and fast (12+MHz) but there are hardware decoders that produce (4) I2S stereo outputs. Would the Guzunty be an appropriate way to combine the 4 I2S 48K signals into a single I2S 192K signal for input to the Pi? Or could it directly decode the ADAT format into 192K stereo I2S?

And yes, it needs access to P5...

PS. I see some opencores for ADAT decoding and I2S generation...

campbellsan commented 11 years ago

I looked at the ADAT receiver resource on open cores.org. There is unfortunately no way that core will fit in a 72 macrocell device. It defines 8 x 24 bit registers which requires 192 flip-flops to start with. The CPLD we have on the Guzunty has only 72 flip-flops. Even the larger 95144 device discussed above would not be able to implement this design.

FPGAs are better for this kind of application because most of them have RAM built in. A Spartan 6 would be able to handle the requirement without difficulty, for example.

I have been toying with the idea of designing a Spartan board for the RPi for a while. Unfortunately, the Spartans only come in large, difficult to solder, surface mount packages. Also, I'd think a kit would come in at about £45 which puts it into a niche area.

ptamike commented 11 years ago

Hi,

There is an FPGA development board for the Raspberry Pi that’s soon to be released. There’s an article in the latest MagPi here:

http://www.themagpi.com/issue/issue-16/

That might help… Mike

From: campbellsan [mailto:notifications@github.com] Sent: 05 September 2013 09:36 To: Guzunty/Pi Subject: Re: [Pi] Merge I2S streams (#11)

I looked at the ADAT receiver resource on open cores.org. There is unfortunately no way that core will fit in a 72 macrocell device. It defines 8 x 24 bit registers which requires 192 flip-flops to start with. The CPLD we have on the Guzunty has only 72 flip-flops. Even the larger 95144 device discussed above would not be able to implement this design.

FPGAs are better for this kind of application because most of them have RAM built in. A Spartan 6 would be able to handle the requirement without difficulty, for example.

I have been toying with the idea of designing a Spartan board for the RPi for a while. Unfortunately, the Spartans only come in large, difficult to solder, surface mount packages. Also, I'd think a kit would come in at about £45 which puts it into a niche area.

— Reply to this email directly or view it on GitHub https://github.com/Guzunty/Pi/issues/11#issuecomment-23851356 . https://github.com/notifications/beacon/wUxxKud08QEKpTAo9cP4c2IPj0LM-DlU_PrZSIHK_xBiHFSiWRk4OW-oRMcln9BC.gif

http://sgmail.github.com/wf/open?upn=9xrkiqY5-2FHVidt7QeBiQ4wVETGUEeFm-2FuBvdm1BIWWUJyaGTDFLSCk6SzEb2kW9w8B6uSQFLjbVigp4NNm1yvX9oUzSGnEoSsLfbUylgeUuovKWq4Pcrz51ZTKkBfvc7WpMK5hCUjM1NXKcIGi2yf0lXbIU-2FvLrucYY-2FS-2BwG-2FiPozequpvGXwZNffwC9r8lHO3Vkgcucu-2BIqTMMxxnehew-3D-3D

campbellsan commented 11 years ago

Yes, I saw that. No word on availability or price though. Looking at it, I'd predict it is not going to be cheap.

The Papilio Pro is a nice price. http://papilio.cc, and the Papilio One 250 would handle this application and is even less.

mark222 commented 11 years ago

Thanks for the look, I suspected ADAT decoding would be impractical. So I can use a dedicated decoder chip (Wavefront AL1402) and there are inexpensive boards with this chip and the optical connector. That gives me 4 streams of stereo I2S at 48K sample rate. Now I want to multiplex those into a single stereo I2S at 192K which, I think, the Pi can read directly. I will need to design a custom core (my digital logic skills are a bit rusty... but what the heck...). Trick is I don't yet know the relative timing of the 4 independent I2S signals, it may be necessary to buffer, which I assume eats up space on the CLPD.

campbellsan commented 11 years ago

Yes, that is what eats the space. If the signals need to be buffered, it won't fit. Sorry.

koalo commented 11 years ago

Maybe you could try to multiplex the input bit-wise. Although, this would require some software for demultiplexing.

campbellsan commented 11 years ago

You could do that, but as I think you're saying, that would no longer be a valid I2S stream.

koalo commented 11 years ago

Depends on the definition of "valid I2S stream" ;-) If this is "8 subsequent bits build up a sample", then it is not a valid I2S stream. If this is "can be transmitted via the I2S interface of the raspberry pi", then it is a valid I2S stream

mark222 commented 11 years ago

I guess I would define it as anything the Pi can read with the stock hardware and drivers! I can write all the fancy post-processing software needed to rebuild the original 8 channels... my challenge is to get those pesky bits onto disk (well, card).

campbellsan commented 11 years ago

Depends on the definition of "valid I2S stream" ;-)

Indeed. :-)

The I2S bus specification does say that you get all the bits of one channel followed by all the bits of the next channel, but if you're building your own bespoke processing software, that will not matter.

With a bit interleaving approach, you can definitely fit it into a Guzunty. The only potential issue would be ensuring that the input streams remain synchronised.

koalo commented 11 years ago

They should be synchronized, because they come from the same IC with only a single clock.

campbellsan commented 11 years ago

Looking at http://www.wavefrontsemi.com/DataSheetsFolder/WavefrontAL1402.pdf I see that there is a single BCLK for all the output channels, so that simplifies things a bit.

campbellsan commented 11 years ago

:-)

koalo commented 11 years ago

By the way: The board you send to me works great, but currently I am hoping that I can build a board that supports a single stereo channel within this month.......

ptamike commented 11 years ago

I've just been in contact with Mike Jones from Valentfx re the LOGi-Pi FPGA boards and he's planning to run a Kickstart project in the next couple of months to help gauge demand and is hoping to get the boards out at $99 or less - it all depends on volumes. So that could be very interesting. Mike

mark222 commented 11 years ago

So all 4 I2S streams put out bit #1 channel #1 at the same time. So the Guzunty needs to sample all 4 of them (based on the same BCLK). Then based on a faster BCLK (192K) put those 4 bits out serially on an output I2S stream as the 1st 4 bits of channel #1. So there is some buffering there. The slower input BCLK and the 4X faster output BCLK have to be synchronized... is that straight forward to generate in the Guzunty? The rest seems like straight forward latching and shifting bits around.

campbellsan commented 11 years ago

Getting stereo interleaving should be possible. I think you could get that working today, never mind in a month. :-)

As for the ADAT bit interleave merge, I still think there is a potential issue. BCLK is derived from the input ADAT optical stream, correct? The issue is that you need to derive an output clock that is 4 times faster than BCLK, but which is phase locked to BCLK. If it is not phase locked, and the two clocks were to drift, you'd be risking bit corruption.

This is definitely a solvable issue though. I have some ideas, if you agree this is a potential issue and wish to hear them.

koalo commented 11 years ago

There is also a 12.288 MHz clock coming out of the AL1402G that is synchronized to the bit clock.

For me there is currently no interleaving planned - just connecting a plain audio codec to the Raspberry Pi ;-) Everything is working with flying wires; it is more a problem with etching, soldering and most of all getting my ordered parts out of the procurement administration of my university. Interleaving is postponed after my stereo prototype works.

campbellsan commented 11 years ago

Yes, I missed that. Thanks for pointing it out.

Then there is no synchronisation issue, you just use DVCO to clock the 4 input channel values out.

Easy Peasy!

mark222 commented 11 years ago

Yes, a synchronized 4X clock (with respect to the input BLCK) is required. The 12.288MHz clock from the AL1402 is too fast but I wonder if there is a simple divider that would do the trick?

campbellsan commented 11 years ago

Well, Guzunty can divide it down, if need be.

However, I'm not so sure it's too fast. BCLK is 64x WDCLK and DVCO is 256x WDCLK, which is 4x BCLK. Am I missing something?

koalo commented 11 years ago

Seems to be ok - BCLK @ 192 kHz is 12.288 MHz. No need for dividing DVCO down.

mark222 commented 11 years ago

Assuming 32 bits/sample, yes (192k * 32 bits/sample * 2 channels = 12.288M). But ADAT is defined as 24 bits/channel 192k_24_2 = 9.216MHz. I guess I could pad out to 32 bits and even use the extra bits for metadata or encoding the original (1-8) channel number. The slower this runs the more reliable and less CPU burden to read it into the Pi, but if it significantly complicates the hardware it may not be worth it.

koalo commented 11 years ago

The number of bits per sample and the samples per frame clock interval are not necessarily equal. According to the datasheet, the bit clock is 64*Fs.

campbellsan commented 11 years ago

Isn't it just going to be 'left justified'?

koalo commented 11 years ago

Left justified, then shifted one bit to the right = I2S

By the way: Keep in mind that you have to do level shifting.

campbellsan commented 11 years ago

Why? Guzunty can handle 5v inputs directly.

koalo commented 11 years ago

And outputs?

koalo commented 11 years ago

Ahh sorry you are right - no need for outputs!

campbellsan commented 11 years ago

Outputs to where? :-)

mark222 commented 11 years ago

"The number of bits per sample and the samples per frame clock interval are not necessarily equal."

I am confused by this comment. The I2S spec implies there are exactly the number of sample bits in a word clock frame (starting with the LSB of the N-1 sample) - https://www.sparkfun.com/datasheets/BreakoutBoards/I2SBUS.pdf.

campbellsan commented 11 years ago

"The number of bits per sample and the samples per frame clock interval are not necessarily equal."

I think this refers to the Wavefront device specification, not I2S... and as you said, on the I2S side of things you're not concerned about sticking with the standard format, so some empty bits would not be an issue, right?

koalo commented 11 years ago

I can not find your statement in the spec. There is even written that the word lengths of sender and receiver are not necessarily equal (3.1 Serial Data). For me this implies that the word clock can not be equal to the word length on both sides in this case. The most ICs that claim that they support I2S only mean that the first bit is shifted (in contrast to left-justified, right-justified or TDM where the first bit comes with the edge of the word clock).

mark222 commented 11 years ago

Correct, this data will be written to disk and post-processed later (not real-time) so any multiplexing and padding can be handled in the software. I was only thinking about slowing the bit rate down by 25% to improve reliability (less jitter, less CPU work to write to disk, etc). KISS may prevail :-)

campbellsan commented 11 years ago

Reliability at that bit rate shouldn't be an issue. The LCD driver core works faultlessly at 32MHz. 12.288MHz, will not be a problem.

I'm doing some work with enabling DMA support for SPI, ATM, so even if the CPU should struggle (which I doubt) there is always the possibility of using that. Really depends on what else you need the Pi to be doing at the same time. If the answer is 'nothing' then I think you're home free.

mark222 commented 11 years ago

As I read section 3.1 of the I2S spec, it still implies the word clock is exactly long enough to contains the transmitter's number of sample bits. If the receiver wants less bits, it ignores the extra (but they are still transmitted in the word clock time). If the receiver wants more bits "the missing bits are set to zero internally" - to me that means they are not transmitted nor part of the word clock time, the receiver just fills it's extra bits with zero.

A moot point really since in this case I have control over sender and receiver.

On reliability I was not concerned about the Guzunty, but about the Pi's reading of the 192K I2S stream. In my case the only thing the Pi has to do is read the bits and get them written to disk (card, or external USB drive). From what I have read this should not be a problem.

Thanks for all the help and advice, it is exciting to finally make progress on this project (been thinking about this for a long while...).

Some of the early posts on this thread imply I might need a modified Guzunty board to access the Pi's I2S signals?