MiSTer-devel / Minimig-AGA_MiSTer

137 stars 53 forks source link

Amiga audio filters (OKK to Sorgelig) #104

Open olekkarlsen opened 3 years ago

olekkarlsen commented 3 years ago

Hey Sorg! Great job with cleaning the code and making the filters come alive! :)

However, i have a few comments and adjustments to suggest.

First of all, the filters needs to run at a higher speed than 1.5MHz. Minimum is 3.5MHz due to the pwm pulsetrain at that speed. The filters need to catch every pulse in the pulse train to be able to form the audio correctly. Impulse response! That's also why it is bad to feed the second filter with only 48KHz output from the first filter. Also feeding the second filter with integer truncated values from the first is bad. The sample should not leave the float domain in the whole total filter chain before it's ready to be sampled to integer at 48 / 96 KHz.

Also... to run to separate filter modules like this, is not the best to do, reasons above and even other reasons.

You'we done a great job, you did what i thought to do (almost), but i didn't have time to complete myself. Also the menu stuff and the config switches were out of my current knowledge of the framework. So Kudos! and Thankyou Sorg!

To add to, further perfect, or fix what you have already done, i'm not going to sync to your work again yet, to fix it myself, because you are the one with knowledge of your filter modules, and it's coeff calculations.

But what is needed is...

As i understand it, your filter has 3 sepparate poles in one go, in one module, where each pole feeds to the next pole not leaving the floating point domain?

And the filter crunches data at audio_clk gated by filter_ce... So the filter input and filter math constantly runs at that frequency even if you do not the output from it with audio_clk gated by sample_ce? right?!

So.. back to the filter chain, outputting 48khz from the first filter and into the next.... BAD!

Run all poles simultainously in one filter module!! ;)

Then, how to switch on and off each filter then? .... I'we got you covered! ;) Do it the way Commodore did it! Do not cut away a filter from the chain...... by "bypass it"... What they did was to move the poles far far away from the audio band when bypassing! Not cutting the filter out of the chain.

Therefore...

Setup ONE of your filter modules!

Use pole 1 for the notch filter. Make it's coeff variable, selectable (coeff 1: A500 at 4429Hz, and coeff 2: A1200 at... say... 30KHz or so... i will get the correct value later from the schematics)

Then, the same filter module, using pole 2 and 3 for the led filter! ;) Since the led filter is the same for every Amiga model, only one coeff set is needed. But that set should also be switchable! ;)

Let coeff 2 and 3 be defined at 3200Hz or so. In reality one pole is at 3KHz or so, and the other is at 3.4KHz... I have the capacitor and resistor values here...

So, make an array for the led coeffs set as well. One with the 3KHz poles, and one with "bypassed" poles, at say... 30KHz or so..... something like a fake infinity.... maybe 1MHz cutoff??!....

This is the way it was done in real hardware! ;)

And, when you do it like this in HDL, then the whole filter chain never leaves the float domain to be truncated by int, and never gets sampled at only 48KHz by the second filter... The whole filter chain runs untruncated in float and at AUD_CLK | filter_ce, from pole to pole, max audio quality, no truncation!

And that filter module needs to run at minimum 3.5MHz... i'd say run it at 7MHz just to be sure, IF Amiga CCK is variable. Must do so, due to the PWM pulsetrain!

So... one filter module at say 7MHz filter_ce...

Pole 1.. two coeff sets selectable (A500 (4429Hz) vs A1200 (Will get back to you on the frq, for now, just set 1MHz)).

Pole 2.. two coeff sets selectable (LED_ON (3KHz) vs (cutoff near infinity or high enough, say 1MHz?!)) Pole 3.. two coeff sets selectable (LED_ON (3.4KHz) vs (cutoff near infinity or high enough, say 1MHz?!))

And then the sample_ce.. at 48KHz or 96KHz, which you didn't make selectable yet, not hard to fix..

You know the filters better than me, i could fix the code, but you'd do it faster than me. Besides i don't know how to calculate your coeffs and gain....

But this is what is needed to do now, what's left.

So that's that. Thankyou Sorg!

This way, the one and only filter module instance is always connected... no matter if you select A500 or A1200 notch filter, no matter if you select LED ON or OFF. However... the only situation to bypass all of those three poles, i.e, the to bypass the whole filter module instance, is when the FILTER_MASTER_SW from the menu is selected to OFF. If so, then feed aud_l and aud_r to the mixer, instead of the filter_out left and right...

See? ;)

ghogan42 commented 3 years ago

When I said 3.5mhz it is because the filter will break and produce static if you try to make a 3rd order filter (combine 1st and 2nd order) at 7mhz and the cutoff frequencies are too low. So the filters might not be able to be combined at 7mhz. That's all.

This is a sample size of one, but here is the filter from an a500. It looks a little lower than 4420. Similar to the LED filter which measures close to 3040hz butterworth which is lower than 3000 + 3400. Here is the non-LED filter spectrum so far. Really we would need more samples to get an idea of how much this varies. I don't know how close to this curve we could get. Ignore the spikes.

unknown

olekkarlsen commented 3 years ago

Sorg, is the filter supposed to run at both pos and neg edge of the 7mhz signal? then it would run at 14mhz eh? but only have coeffs for 7mhz? i think it still sound a little muffled, and sounds like it's bandpassed.... hmm....

i did this.... but havent tested yet.... have to go to bed... .ce(clk7_en); // | clk7n_en),

Gohgan... will look at your post tomorrow... nighty night guys..

olekkarlsen commented 3 years ago

Meassure the MiSTer then Gohgan?... compare.... you should also build my version with real float filters too see... it sounds better.... no offence Sorg! ;) i do use float afterall, and Sorg uses fixed point math... and my fixed point version sounds flat too... in comparison to the float one....

so i you can, Gohgan... go fetch my core and build it too.... it's on my repo... night....

sorgelig commented 3 years ago

ce must be double for stereo. So it must be 14Mhz (7MHz per channel). cck is always the same.

olekkarlsen commented 3 years ago

Gohgan, i understand now... IIR filters have recursive cumulative math.... precission needs increases with each calculation run if the coeffs and gains are not balanced perfectly, or something like that.... my understanding is that this kind of filter is "kind of alive".... it pulls and pushes all the time, and the pulls has to "cancel out" the pushes for the filter to be in balance... kind of like a vektor for amp increase and amp decrease... the fine grain of that vector... i.e.. the smaller the number, the more float resolution is needed... and if that value gets null saturated.. or... if there is too little precisson, too few digits after 0.xxxxxxxxxxxx, then the filter destabilizes, and it is cumulative....... something like that.... This is my vague understanding of iir filters so far... how they really work.... you Gohgan, seemed to know this already, and wanted to warn us of this.... and it makes sense now. BUT.... if youre thinking that 7mhz would destabilize a filter for low rate impulses... our fixed point math filter.... Sorg's.... then, high math precission would fix that, if i understand correctly... therefor a float filter might be able to do it better without breaking the IIR stability, due to high resolution on the mantissa part.... even a float64 would do it, when a float32 could not?.... this is my understanding now...

But..... absurdity..... dfloat?....... when float32 is bad enough for the fpga to handle..... i use float27 for my filter, i have had it up to 28MHz and it sounded good and right..... but.... when i tried float32 on the fpga... i got a blank screen, many compile tries.... but those filters were not time constrained though...

so.. i have one version running the float27 just fine.. and it sounds majorly better than my float27 running at 3.5MHz... and it is stable too... the iir..... UNTIL i add the led filter to it.... pass float from stage to stage, untruncated... then the whole filter chain breaks down, and no sound.... due to what you are saying Gohgan.... because my fixed point filters worked perfectly when they were chained both filters, notch and led... at high N64Qsomething...and at high frequency... but the fixed point sounded flat and lifeless....

anyway... i guess we should decrease our filter clock to 3.5MHz then, Sorg? Fix it to CCK... not because we wouldn't benefit from 7MHz (higher "interpolation" quality) of the reconstructed wave, BUT because the IIR maths would be more stable as Gohgan points out.... even though my listening tests says otherwise, 3.5MHz filter clock sound bad in comparison to higher clock rates, but Gohgan's got a good point.... if we had done FIR, we could let loose with the frequencies, but with IIR we have to be careful..... He's got a point. Atleast when we use fixed point math with a conservative floating point resolution.....

Hey.. Sorg.... you say ce must be double for stereo?...... we are not interleaving are we?.... no.... makes no sense.... am i glitching now, or are you? for each audio_clk we sample a stereo pair in parallel do we not? so therefore we sample one sampleframe (stereo pair) at each clock... therefor, no interleaving..... no?

remember, we didn't double the ce gate for 3.5MHz either..... and i guess you had those coeffs at 3.5MHz too....

Why... is the filter interleaving or serial processing stereo internally?, it couldn't... well it could, but... anyway, the filter samples it's input at each ce.... as a pair.. so... no clock doubling there.... i think maybe you got this mixed up a little? maybe too tired, overworked? :) idk...

I'd say... ce should be exactly like the coeff rate.... and double rate would not be correct for stereo... in comparison to single rate for mono.... we are not indeed interleaving... but sampling the stereo pair as a package... in parallell, per clock...

so.....

maybe you just oversaw that... while tired?...

Cheers to both of you!

i'd say.... we try to run the filters at CCK, and no ce doubling... feed it 3.5MHz coeffs for a 3,5MHz ce... and keep the input output between filter stages at full ce rate.. also the last stage's output at ce=1....

as of the cutof values..... let the notch be at 4429hz, it's calculated from resistors and caps from the commodore schematics, no matter whats meassured or not....

the led pole cutoffs, is debatable.... either use a median mean number like you did... (3400+3200) /2, or set the two poles sepparately at 3.4 and 3.2 or 3.0....... or.... use Gohgan's led cutoff value...... it wouldn't matter much anyway... everybody hates the led filter! except maybe A1000 purists..... :D BUT the notch filter IS important! and we got that correct! Except that fixed point doesnt sound as good a floatingpoint math....

So.... yeah :)

olekkarlsen commented 3 years ago

But, we can't go lower than 3.5MHz because we need to catch all pulses of the pwm pulse train! That is of most importance!

olekkarlsen commented 3 years ago

Sorg.. i think you might confuse and mixed up the fact that the output I2S is interleaved.... but core sample parsing is not.... not yet anyway..... so therefore... no clock doubling at that level in the process... right?!

olekkarlsen commented 3 years ago

It's human to err... no matter how genious... we all do. i do too... specially when tired or are working on automatic, overworked.... in pragmatic mode..

Sorg, you are a machine! ;) Man how much you reach over! How much you get done.... I guess you are in your 20's? fresh young mind? am in my 40's here.... anway... it's easy to overlook something, mix up things now and then... anyway... it's human.

My understanding is that filter ce should not be double..... it is the i2s stage later that is interleaving.... not at core level, yet.... ;) right?....

sorgelig commented 3 years ago

On each ce pulse a single channel is processed. 2 channels need double rate ce. This is why filter doesn't consume double resources for stereo. I wrote comment in IIR_filter about double ce requirement for stereo.

even though my listening tests says otherwise, 3.5MHz filter clock sound bad in comparison to higher clock rates

if you simply half the filter ce without changing coefficients then cutoff is halved too. Of course you will hear the difference.

olekkarlsen commented 3 years ago

Wasn't that i meant. But i feared that this was the effect of the doubled ce.. only in reverse, or something.. that's why i reacted... i thought your ce doubling with coeffs for an undoubled ce was wrong... now i see that you are only multi cycling two clocks for one operation.. right? :) mystery solved.

olekkarlsen commented 3 years ago

Just looked over the code tree again... and just wanted to say thankyou again for your work effort, AND patience Alexey! Ofcourse in my own local builds i will be using my own float versions of the filters, and until i can time constrain and perfect my code even further, the rest of the community now atleast has authentic pwm volumed Paula sound in it's truest form, with both lowpass filters to go, only hitch is that those fixed point filters are LOFI in comparison to the floating point ones which indeed sounds HIFI, beleive me... but... This is a huge facelift for the community anyway! Better with muffled filters than no filters. Yeah, there is that difference, but i'm not being a critic, it's a great overhaul Sorg! A great overhaul. Good work! :)

Until you will be satisfied with my float time constraining and implementation.... later... but for now... a much needed update! what was done. Sorg!

Great work! Cheers!!!

olekkarlsen commented 3 years ago

I noticed you modified the led dimmer to not interfere with the filter switching too... been all over the code.... ;) looking! ;) only one thing now... please... the output stage on the framework seems to be hardcoded at 48khz output, not listening to the settings in the master ini file.... seems that way... am i correct? should be fixed....

olekkarlsen commented 3 years ago

Also... a suggestion... and i would do it myself if i knew hdmi stuff....

keep the 16 bit output for each core... but in the framework, mix using 20 bits, and output 20 bits over hdmi, so that no level halving in the mix would be needed.

Either that, or give the cores 18 bits headroom so they can mix 4 * 16 bit streams themselves.... multiple chips they'we got that needs mixing... each at 16 bit.. would allow the cores to mix 4 chips without volume lovering... so 18bit core output... AND then that last to bits of the 20, in the framework, so that the framework has headroom for mixing alsa and whatnot.... 4 stream added... so 29bit output... hdmi..

compatibility issues? what gear doesnt do 20bit these days? It could be an option... lovered volumeed end mix at 16bits, or volume lossless 20bit output mix...

sorgelig commented 3 years ago

framework uses either 48KHz or 96KHz according to ini settings.

sorgelig commented 3 years ago

in your floating point filter there is unnecessary way to process stereo in single module. You filter can be a mono filter used per channel, so code will be simplified.

olekkarlsen commented 3 years ago

Yep.. can be fixed... Hey Alexey.... i'we solved the pipeline IIR filter algorithm... I had trouble with the recursives of the filter.... stage 4 needed a value from stage 6 in the pipe... man did i twist me head to understand what was going on, simulating it all, stepping through clocks in my head.... took me a few days... but now i know what to do... atleast in theory.... havent tested it yet... but the solution is to run stage 1 to 4 as a pipeline, stall it, and run stage 5 and 6 as multicycle, and as that finishes, wake the pipe at stage 4 again to use the now ready value from stage 5 and then 6's multicycle part! Also.. the multicycle at stage 5 and 6 can reuse the one of the multipliers and adders from the pipeline earlier, as its now stalled and not using the alu elements...

The sound will have a delay of 8 clock cycles, but thats nothing... i would chuck out samples at what clock rate you feed the module, divided by two, because of that last multicycle part, which has to be multicycle....

I wonder if that version when coded and finished would behave any better with timings? If i can constrain it right...

I will atleast try.. as iwe now tried both combo jungle, and pure 8 clocks multicycle, and now a pipeline hybrid with stall and the two last stages as multicycle.... gets the best of two worlds... has to be that way to.. due to the time recursive maths of the IIR algo in it self.

Well... night night..... going to get some sleep now...

Later Sorg. :)

sorgelig commented 3 years ago

You always can see slacks in compilation reports. If they are negative then you can see worst paths in TimeQuest utility which you can start in Quartus.

olekkarlsen commented 3 years ago

I know that.. but the paths are sometimes hard to understand... too much low level signal info and stuff... also hard to understand the via or thru path of a multicycle constraint... to understand what needs to be constrained, and how to do it the right way with timequest. some times a constraint has no effect because i do it the wrong way in timequest probably. Many people are striving with this.. i know.

sorgelig commented 3 years ago

if you are using CE in all registers assignment in the module then you can set muticycle path equal to CE in sdc for all registers in that module to relax the timings.