WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.04k stars 165 forks source link

provide a different way of how analyserNode returns the frequency data #2501

Closed jw-12138 closed 1 year ago

jw-12138 commented 1 year ago

Describe the feature

analyserNode is cool enough, but i noticed a small problem here.

while using AnalyserNode.getByteFrequencyData() i notice that the given data is linear, meaning there is a lot data(over 50%) in the high frequencies which to our human ear is very difficult to hear, so if i visualize the data, the graph will always be looking like a downhill, causing the hearing and visual experience are not matching.

image

i've looked some professional equalizers, and i found their frequency distribution is always exponential.

image

for example Ableton Live here, the frequency distribution is like 10Hz - 100Hz - 1000Hz - 10000Hz, and the 10kHz - 20kHz part has got only around 10% of the whole graph.

so i'm thinking, is this possible for the analyserNode to achieve the same behavior?



Is there a prototype?

yes, i managed to make a graph with analyserNode itself. here is the link: https://jw1.dev/frequency-test/test-2.html

since i don't know the exact frequency range from the analyserNode(maybe tell us in the docs?), i took a guess: 10hz - 30000hz

and i split that into 4 band and rearranged their distribution:

image

(unit in Hz)

now, to me, the experience is much better.



Describe the feature in more detail

You might ask, since you've already done this in the browser, why are you still here?

well, the demo above isn't perfect:

the first issue could actually be solved by creating a function to calculate the affecting range, but in that case, the work on front-end will be super heavy.

so i wish you guys could give it a look! thank you!

braebo commented 1 year ago

I would love this. Thanks for the demo! Very helpful.

meshula commented 1 year ago

For LabSound, a C++ derivative of WebAudio, I added several accessors along those lines, starting here

https://github.com/LabSound/LabSound/blob/c0ea5b771833a5e5654288dfab66e4dfd6580a91/include/LabSound/core/AnalyserNode.h#L57

Mentioning it in support of the idea, since I found myself needing that functionality frequently.

padenot commented 1 year ago

The equations to go from bin index to frequency (and the opposite) are (for example): https://searchfox.org/mozilla-central/source/dom/media/webrtc/tests/mochitests/head.js#219-244.

It's fairly straightforward to draw the analysis array it with an x-axis that has a log scale, and to decimate or average some bins, if need be, but the Web Audio API can't assume log-scale, always.

I'm not sure I understand those two points:

I'm not sure what this means

  • fftSize is too big; to create a 16-band frequency visualizer, the fftSize must be 2048(default) and above, by making analyserNode to do such thing, the fftSize could drop to 32(lowest), latency in front-end will be significantly decreased

That's the way a Fourier transform works, unfortunately. The analysis has to happen somewhere, and if you want high-resolution informations, you need lots of bins.

jw-12138 commented 1 year ago

@padenot hey there! thanks for replying!

let me explain the first problem you had, the "sweep sound" is actually a sine wave continuously and linearly increasing its frequency from 30Hz to 20000Hz, ideally, the graph should be like a peak smoothly moving from left to right, but now it's just so "chunky" (I had a few ideas about how to smoothen it, but let's just focus on the problem now). the reason behind that is because I need to extract the data (especially the low frequency data) and reform it to the way I wanted, and that leads to the second problem: If I needed to extract and reform the data, I have to have enough data to do the processing, thus, a big fftSize.

let's imagine this, when the fftSize equals to 2048, meaning the AnalyserNode could return 1024 useable data as an array, if the frequency range of the analyzer was set to 20 - 25000 and the data was distributed linearly, meaning array[0] represents the data 20Hz - 44Hz, this is good for the data-reforming. now let's decrease the fftSize to the smallest, which is 32, if we do the math again we would find out that the array[0] now represents the data 20Hz - 1.5kHz, there is no way for front-end developers to extract the data in this range. of course we can always use bigger fftSize, but the bigger the fftSize is, the more latency will be presented visually, and there will be more data needs to be processed at the front-end.

the Web Audio API runs on the low-level, which could be way faster and more efficient than JavaScript, it would be great if AnalyserNode could provide logarithmic or exponential scale data directly.

mjwilson-google commented 1 year ago

I think there might be a misunderstanding about what the FFT (and thus AnalyserNode) is doing.

The FFT bins contain samples in the frequency domain, and are always evenly spaced between 0 and half the time-domain sampling rate. These are both properties of the mathematics of the discrete Fourier transform, and can't be changed by the spec.

The first point means that in your example, array[0] doesn't represent all the spectral energy from 20Hz to 44Hz or 20Hz to 1.5kHz, but the energy sampled at one specific frequency. The specific frequency in Hz depends on the bin index, sample rate, and FFT size.

The second point means that the FFT frequency range is dependent on the time-domain sampling rate. There is no way to set it independently, and we don't get to decide which frequencies correspond to which bins.

Finally, the data presented by the AnalyserNode is all of the data that is available. The only way to increase the resolution of the data is to increase the FFT size, which will increase latency.

If you want a smooth visualization across various specific frequencies, one method is interpolating over the FFT bins adjacent to the frequencies you are interested in. Remember that the bins are samples at specific frequencies, they don't contain information for a range of frequencies. So you can interpolate between them to get an estimate of the frequencies in-between. This could also be done with a low FFT size, although the estimate would probably be less accurate depending on how you do the interpolation.

There are many possible interpolation methods and frequency scales that could be used. Some might look nicer, some might be more useful for analysis. The potential spec question is if we want to require additional specific interpolation methods and frequency scales for every Web Audio implementation.

My personal feeling right now is that, since all of the information necessary to do this is available to the JavaScript layer, there isn't a large benefit to putting this in the spec. I think a JavaScript module could be made roughly as efficient, and could be much more flexible.

jw-12138 commented 1 year ago

@mjwilson-google thanks for that! now the whole thing is way more clear to me!

hoch commented 1 year ago

For this type of question, I recommend the community-driven WebAudio Slack channel: https://web-audio.slack.com/

jw-12138 commented 8 months ago

so, after a long time i think i should just give it another try, and i think i did it, i know how to convert linear data into logarithmic data, here is the function if anyone needed this:

function logScale(data) {
  let temp = []
  let length = data.length
  let maxLog = Math.log(length)
  let step = maxLog / length

  for (let i = 0; i < length; i++) {
    let dataIndex = Math.floor(Math.exp(step * i))
    temp.push(data[dataIndex])
  }

  return temp
}

oh and the interpolation function for a more visually appealing result:

function easeInOutSine(x) {
  return -(Math.cos(Math.PI * x) - 1) / 2
}

function interpolate(data, easing = easeInOutSine) {
  // since the low-end data is more step-ish, we would just need to process this part, like 3/4 of the data
  let halfwayPoint = Math.floor(data.length / 4)
  let firstHalf = data.slice(0, halfwayPoint * 3)
  let secondHalf = data.slice(halfwayPoint * 3)

  let output = []
  let group = [firstHalf[0]]

  for (let i = 1; i < firstHalf.length; i++) {
    if (firstHalf[i] !== group[0]) {
      // if all elements in the group equal 0, add them to the output array
      if (group[0] === 0) {
        output.push(...group)
      } else {
        // calculate the step according the count of same-number elements
        let step = 1 / group.length
        let difference = firstHalf[i] - group[0]

        // copulate the output array
        for (let j = 0; j < group.length; j++) {
          // Apply the easing function to the interpolated value
          let value = group[0] + difference * easing(step * j)
          output.push(value)
        }
      }

      group = [firstHalf[i]] // Reset the group
    } else {
      group.push(firstHalf[i])
    }
  }

  // process the final group
  for (let j = 0; j < group.length; j++) {
    let value = group[0]
    output.push(value)
  }

  // combine the processed first half and the original second half
  return [...output, ...secondHalf]
}

here is the result:

https://github.com/WebAudio/web-audio-api/assets/29943110/7bbed681-2748-495e-a749-5137c4a465bd

the page is in here: https://frequency-test-ten.vercel.app/test-7.html

vincerubinetti commented 5 months ago

I'm going to drop what I did here, though it's probably not as nice as the demos above. I ended up not using it, so this could serve as a reference if I need it in the future.

// graph that passes through [0, 0] and [1, end], with power curve
function power(value, end, power) {
  return end * (power ** value - 1) / (power - 1);
}

// number of bands to split spectrum into
const bandCount = 1000;

// get band frequencies from 0 to nyquist, evenly spaced by power instead of linearly
const frequencies = Array(bandCount + 1)
  .fill(0)
  .map((_, index) => index / bandCount)
  .map((value) => power(value, audioBuffer.sampleRate / 2, Math.E));

function getSpectrum() {
  // load spectrum data into buffer
  const buffer = new Uint8Array(analyser.frequencyBandCount);
  analyser.getByteTimeDomainData(buffer);

  // array of arrays to contain amplitude values in each band
  const bands = Array(bandCount).fill([]);

  for (let index = 0; index < buffer.length; index++) {
    // linear frequency from getByteTimeDomainData
    const freq = (audioBuffer.sampleRate / 2) * (index / (buffer.length - 1));
    // find band that this frequency belongs in
    const band = frequencies.findIndex((f) => f >= freq);
    // add to amplitude value to band
    if (band !== -1) bands[band].push(buffer[index]);
  }

  // average amplitude values in each band, normalize to 0-1
  return bands.map(average).map((value) => value / 255);
  // there can end up being bands with nothing in them if your fftSize / bandCount ratio is too small. either make that ratio big enough or interpolate empty bands from adjacent bands.
};

// average array of numbers, return 0 if array length 0
function average(array) { ... }