MeteorStudioASU / lcc

LCC (Localization Cue Correction) is a solution for spatialized audio through stereo speakers. LCC is a lightweight implementation of crosstalk cancellation so your left ear hears the left channel of audio, and your right ear hears the right channel of audio while ensuring unwanted audio signal coloration is mitigated.
MIT License
179 stars 13 forks source link

Questions about decaygain #8

Closed vessenes closed 4 years ago

vessenes commented 4 years ago

I'm just trying to parse the audio code here: https://github.com/MeteorStudioASU/lcc/blob/6af1987c8bafd6331db2d585a484c654ec07397c/lcc_rtaudio.cpp#L43 -- please excuse any funny use of language - I'm not conversant with audio theory, but happy to learn and be corrected :)

The linked paper you refer to specifies that each channel needs to take into account the output from the other channel and negate it, with delay, so that waveforms don't reach the 'wrong' ear. And it notes that you need to then negate the negation and so on, with a decibel reduction each time until presumably the negation is inaudible.

The line I link to shows this decaygain factor, but it doesn't seem to reapply the decay back and forth between sides, and I'm wondering what it is I don't understand. lcc() takes by default a 4k buffer, I presume of output, and therefore samples between left and right ear, so 2048 samples of sound. I presume these are the next 2048 output amplitudes for the speaker.

So, a couple of questions:

  1. Shouldn't the decaygain be applied more slowly than this? -2.5db per sample means we will fall off to inaudible very quickly, and at a rate that varies based on the sampling cycles of the audio stream.

  2. Where is the delay here? the paper discusses the length of time sound takes to get between one ear and the other; I'm not sure I understand how that's taken into account. Delaymod seems to only work between samples; in my naive imagination, the delay should be calculated out as some function that figures out how many microseconds there are per buffer sample and then looking that far ahead / behind in the buffer. I presume I'm just not understanding something very well.

  3. What happens around buffer boundaries? Does the effect have problems because it restarts every 2048 samples, or am I misunderstanding how buffered audio is managed?

Thanks for the help! Trying to get my head around this.

roblkw-asu commented 4 years ago

Thanks for your interest. We're planning to release more thorough/organized documentation with a video tutorial, sometime in the next two weeks. Stay tuned!

mattlane66 commented 4 years ago

Hi,

Thank you for your interest. I may be able to help you. Yes crosstalk cancellation needs to be recursive or ping pong like as you seem to appreciate. LCC is fully recursive even if it is hard to see this in the weeds of the app.

In answer to your questions. The loss across the head from a speaker at one side at higher frequencies is about -2.5dB so that is what the cancellation level needs to be at the opposite ear. Since the ear can hear sound at a -90dB level it will take 36 cycles or so to die away. The time this takes has nothing to do with the digital sampling rate. If there are more samples then they are closer together at the DAC and ear but the delay and attenuation are analog functions to the ear and of course there is a dac involved so the samples at any rate are integrated to form the recursive decaying analog signal at the ears which will not depend on the sampling rate.

The delay involved is the time difference between the ears for a given speaker in front of you. So to delay a signal, formed of a chain of samples, you put them into a buffer and read them out again with this fixed delay that only depends on the speaker angle to the ears and the head size. Nothing to do with sampling rate. If there are more samples then you need a larger buffer to store them during the delay period. Again the delay is a psychoacoustic value that varies with each listener.

I don’t know about buffer boundaries, but LCC only needs to store enough samples to accomplish the delay. This delay for most humans and speaker angles is about maybe 90 microseconds, so storage is not much of a problem. The buffer is emptied as often as it is filled so again buffer size is not an issue.

If you wish you can go to www.ambiophonics.org and read several AES papers on this subject. If you have any more questions please forward them,

Regards,

Ralph Glasgal

From: Peter Vessenes notifications@github.com Date: June 30, 2020 at 4:41:57 PM EDT To: MeteorStudioASU/lcc lcc@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [MeteorStudioASU/lcc] Questions about decaygain (#8) Reply-To: MeteorStudioASU/lcc reply@reply.github.com



I'm just trying to parse the audio code here: https://github.com/MeteorStudioASU/lcc/blob/6af1987c8bafd6331db2d585a484c654ec07397c/lcc_rtaudio.cpp#L43 -- please excuse any funny use of language - I'm not conversant with audio theory, but happy to learn and be corrected :)

The linked paper you refer to specifies that each channel needs to take into account the output from the other channel and negate it, with delay, so that waveforms don't reach the 'wrong' ear. And it notes that you need to then negate the negation and so on, with a decibel reduction each time until presumably the negation is inaudible.

The line I link to shows this decaygain factor, but it doesn't seem to reapply the decay back and forth between sides, and I'm wondering what it is I don't understand. lcc() takes by default a 4k buffer, I presume of output, and therefore samples between left and right ear, so 2048 samples of sound. I presume these are the next 2048 output amplitudes for the speaker.

So, a couple of questions:

  1. Shouldn't the decaygain be applied more slowly than this? -2.5db per sample means we will fall off to inaudible very quickly, and at a rate that varies based on the sampling cycles of the audio stream.
  2. Where is the delay here? the paper discusses the length of time sound takes to get between one ear and the other; I'm not sure I understand how that's taken into account. Delaymod seems to only work between samples; in my naive imagination, the delay should be calculated out as some function that figures out how many microseconds there are per buffer sample and then looking that far ahead / behind in the buffer. I presume I'm just not understanding something very well.
  3. What happens around buffer boundaries? Does the effect have problems because it restarts every 2048 samples, or am I misunderstanding how buffered audio is managed?

Thanks for the help! Trying to get my head around this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MeteorStudioASU/lcc/issues/8 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXTNV3K645LDEVETAROXM3RZJEZHANCNFSM4OMUHFEQ . https://github.com/notifications/beacon/ADXTNV3NMGBFJ4KUDEFP5YDRZJEZHA5CNFSM4OMUHFE2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4JVG77JA.gif

vessenes commented 4 years ago

Thanks, this is super helpful and interesting. I appreciate the links as well.

vessenes commented 4 years ago

Please feel free to tell me "later" :)

In this paper about the RACE processor, you can see in the pipeline that there is a short delay post inversion. To restate my original question more precisely; where is this delay in the lcc code? I don't understand how it is embedded in the c++ codebase.

Screen Shot 2020-06-30 at 5 57 47 PM