bbbradsmith / nsfplay

Nintendo NES sound file NSF music player
https://bbbradsmith.github.io/nsfplay/
279 stars 43 forks source link

Audio aliasing #15

Closed LGA1150 closed 4 years ago

LGA1150 commented 4 years ago

Sawtooth sweep aliasing.zip This is a linear sawtooth sweep from N163's lowest frequency to the highest Start at 0:20 I can hear the artifacts of audio aliasing.

Is there a way to eliminate aliasing? LPF before sampling?

bbbradsmith commented 4 years ago

With the limited bit-depth of the waveform sample I doubt you could do much with pre-filtering, but it might help a little.

In 8 channel mode the N163 has an effective samplerate of 14 kHz. You can use less channels to get higher samplerates, and less aliasing as a result.

https://wiki.nesdev.com/w/index.php/Namco_163_audio#Channel_Update

LGA1150 commented 4 years ago

The nsf I uploaded has only one N163 channel. Therefore it's not because of the serial mixing. The aliasing is because some harmonics go above the Nyquist frequency and are not filtered before sampling. Real Famicom might not have this issue because it's audio mixing output is pure analog.

I do not own a Famicom. Could someone please test it on real hardware?

bbbradsmith commented 4 years ago

Apologies, the NSF writes to $4800 to set 8 channel mode at the start, but it later writes it again as 1 and I missed that second write.

Looking at the output, it does look like there is some possible reflection at the output frequency. NSFPlay's downsampling method is due for a rewrite in version 3, so if the issue is there it's on the list and will be resolved eventually. (This is not a good way to test the resampler, though, I have other methods I can use for that which are more direct and less compromised.)

However, there are many factors in play at once with the N163. The quantization of the sample is a source of various harmonic noises too. The length of your sample only being 16 steps long carries its own reflections even when played back at relatively low frequencies. I think you're still going to have much of the objectionable sounds you're trying to point out even if NSFPlay's downsampler was ideal.

As for the hardware test, "could someone" is a question for a more public forum, right here you are mostly only talking to me. This is not a place of frequent traffic for anyone else.

I can't record arbitrary expansion audio NSFs on hardware, as I don't have a TNS cart for this. If you can build a hotswap ROM, I could record it for you, but with no guaranteed timeline:

https://github.com/bbbradsmith/nes-audio-tests

bbbradsmith commented 4 years ago

Though, one thing that can demonstrate what parts of this are due to NSF's samplerate/downsample and what parts are not is adjusting your output samplerate (it goes up to 96kHz) and also adjusting your output quality (increases oversampling).

With both of these at max there is a slight reduction in reflection from the output samplerate, but I don't think these reflections are anywhere near as prominent as the other distortion sounds already caused by the N163's internal methods, which I believe are accurate.

bbbradsmith commented 4 years ago

An excerpt of the spectrogram to illustrate:

X_pattern

The lowest white bar is the fundamental of the sawtooth at 3kHz. The pattern of Xes is really the most prominent distortion here, and I do not believe it has anything to do with NSFPlay's downsampling. This is inherent from the internal operation of the N163, and its quantization.

So in answer to your initial question: generically, I think you can mitigate this a little bit by using a longer sample. However in this particular case of the saw, quantization would make any playback rate increase ineffective.

bbbradsmith commented 4 years ago

For reference, I created a 16 step saw, and put it into OpenMPT tracker as a sample. With no interpolation, but otherwise a very high precision sampler, you get the exact same kind of distortion: reference2

So, I the distortion you're hearing is just an inherent property of 16-step quantization of a saw wave. This isn't a property of N163 specifically, or of NSFPlay.

Here's the WAV sample: saw16.zip

LGA1150 commented 4 years ago

Even if it's the quantization error, it shouldn't go below the base frequency. Here is another example Square aliasing.zip Replaced sawtooth with 50% duty cycle square wave. There shouldn't be any quantization error of a square wave. But I hear similar artifacts

bbbradsmith commented 4 years ago

Well, yes, doing a bit more investigation, you're right, this particular distortion isn't really about quantization. Though I was thinking more about the way quantization prevented any effective increasing the saw sample size past 16... lack of sample length wasn't really the problem either, which a square wave made clear, and I needed to re-evaluate my theory.

To take NSFPlay out of the equation for a moment, I will generate an idealized version of the N163 sampler at its native frequency of 119318Hz which produces more or less the same function as your NSF:

import wave

def squaresweep(channels,samplen,start,inc):
    samplerate = 119318 // channels
    wavelength = samplen << 16
    halfwave = wavelength // 2
    w = wave.open("squaresweep_%d_%d_%d_%d.wav" % (channels,samplen,start,inc),"wb")
    w.setnchannels(1)
    w.setsampwidth(1)
    w.setframerate(samplerate)
    d = bytearray()
    accum = 0
    pitch = start
    for f in range(60*60):
        print("%d" % f)
        for s in range(samplerate//60):
            accum += pitch
            while accum >= wavelength:
                accum -= wavelength
            d.append( 0 if accum < halfwave else 255 )
        pitch += inc
    w.writeframes(d)
    w.close()

squaresweep(1,16,0x12C,0x10)

This generates a wave-file with no loss due to downsampling. When I did this and looked at the full range of the spectrum the answer was clear: squaresweep_fullspectrum

Here I can see the harmonics of the square wave reflecting off the ceiling and causing the X patterns. My conclusion: this aliasing is caused by the N163's internal frequency of 119318 Hz (and its own lack of interpolation).

So, thank you for your insistence, because it's cleared up the theoretical reason for this effect for me, however it indicates no failure of NSFPlay that I can fix, because this aliasing will be the authentic output of the N163.

It also unfortunately means I can't recommend any steps to mitigate it, other than trying to pre-filter your waveform sample, which as stated before will be pretty tough due to the quantization. (A modern tracker/sampler would address this with an appropriate filter in the sample generator instead.)