bbbradsmith / nsfplay

Nintendo NES sound file NSF music player
https://bbbradsmith.github.io/nsfplay/
279 stars 43 forks source link

How "Accurate" Is NSF To WAV Conversion? WAV Bit-Depth? #12

Closed 0xvividmirage closed 5 years ago

0xvividmirage commented 5 years ago

Hello, would you mind helping me understand how "accurate" the NSF to WAV conversion is when a WAV file is saved from the NSF? Is the WAV an accurate representation of what the NSF is "playing"? Is it a lossless conversion?

I read somewhere that NSF files are equal to a 1-bit WAV file. Is that true? What is the sample rate of an NSF? Does it matter?

I'm asking because I am analyzing WAV files saved from NSF play and am seeing some stuff I didn't expect in my oscilloscope.

koitsu commented 5 years ago
  1. WAV is a lossless format, so the answer to this question is yes.
  2. That's a weird analogy, IMO. Where did you read this?
  3. Please provide details, re: "some stuff I didn't expect in my oscilloscope".
bbbradsmith commented 5 years ago

NSF is a program that runs on an NES to produce sound. NSFPlay simulates the NES part of this.

"Lossless" does not have a directly applicable meaning here, as nothing is compressed or discarded. "1-bit WAV" is not really applicable at all (maybe you are thinking of PC speaker or ZX beeper?). You will have to clarify for me what you're interested in knowing.

There is an internal digital logic component to NES sound generation, and for that each component has a bit depth, but when they become an analog sound signal these internal bits are no longer part of the sound. It goes through analog DAC, amplifier/mixer and filter circuitry on the way out of the machine, so the final result is just an analog sound signal.

16-bit depth is way more than enough to accurately represent this analog output.

If you're expecting the waveforms produced to look a little more "chunky" and "flat" and less "curvy" you could try opening the options and turning the LPF/HPF strengths to zero (lowpass/highpass filter). Maybe that would show you what want to see? Without these filters it is much less accurate to the NES sound, but if you're trying to get an idea of what the internal digital waveforms might look like, it might be closer to what you were hoping to see?

0xvividmirage commented 5 years ago

Some context might be in order. I have spent the past 3 or so months; in my spare time, been trying to use a Korg MS-20 synthesizer to recreate the Strider Hiryu intro song. I was inspired by this BarxMusic's material. I am new to synthesis so it was a bit of a rabbit hole for me I ended up needing to do some research on how NES music works, which lead me to NSFPlay and its ability to isolate sound channels so I could see what was going on under the hood.

I noticed that the pulse waves looked different than what I expected. Which was: image

but instead I saw that the pulse waves looked like the blue lines in the following image: image

I briefly read something from somewhere that the latter waves are "AC Coupled" but I dont know what that really means... I also noticed that square waves from the MS-20 appear in this way as well according to my analog oscilloscope, which is not what I expected to see. I initially thought the NES was doing something "special" to the pulses... But now I am not sure what I am seeing.

I read about the 1-bit WAV thing from some random NES dev forum but just thinking about it for a moment, its obviously wrong because that would mean the NES had just an on and off state for sound, which is wrong. I did a bit more research just now and saw that an NSF file is actually 6502 assembly with a special header?

Please forgive my ignorance on various lines of questioning, but thats why I'm here. I have revised my questions.

1) What is the minimum bit-depth and sample rate of an NSF file when it is saved out as a WAV? 8-bit? I want to save out these NSFs as lossless WAVs as small as possible (but uncompressed) and at the correct sample rate for the NES.

2) It seems to me that NSFPlay is synthesizing the sound from the NSF files. So these files do not sound 100% the same as playing from an actual hardware NES -- am I correct? I see the options for controlling the filters and it makes me think NSFPlay is guessing how a games song should sound. Are these filters and synthesis settings dictated by the NSF format?

3) Do you know of a way to play a NES cartridge on an NES? Imagine popping in a NES cart just to play its music with a control software.

bbbradsmith commented 5 years ago

Yes, NSF is a 6502 program with a special header. There are some other details, but that's the general idea. This is a better reference than the one you linked:

http://wiki.nesdev.com/w/index.php/NSF

The shape of the waveform you're describing is due to a high pass filter. That is also what "AC coupling" means, but if you just look up what a high pass filter is you'll probably find easier to understand information about it. (AC coupling is probably unhelpful jargon at this point.)

NSFPlay has a setting for "HPF" which you can set to 0 to disable it if you want to see the waveform without that effect.

  1. The NES "samplerate" is 1,789,772 Hz, but this is not an audio samplerate. That is simply the frequency of its CPU. There is no reason to output a waveform at this rate, and I don't know of any audio player or editor that can handle it anyway.

The actual sound output of the NES does not utilize the high frequency range. All audio amplifiers will have some kind of filter to eliminate frequencies outside the useful range, and the NES is no different in this respect. The ultimate output from the NES is analogue and intended for human hearing.

48000 Hz is the default samplerate for NSFPlay's sound output. There is a higher samplerate of 96000 Hz but it is provided only for hardware compatibility and not for the benefit of human hearing. Loss of fidelity won't occur unless you choose 22050 Hz or lower, everything above that represents all audible frequencies capably.

The CPU clock rate is accounted for in the synthesis, and does make a difference there, but it is completely unrelated to the needs of the output audio. The output audio should be in a normal audio samplerate (i.e. 48000 Hz).

  1. The output is as close to an NES as is possible to simulate, or at least that's the goal. There are options that allow you to deviate from this goal if you wish to. (The default settings do not deviate from this goal.)

Essentially, working on this emulator consists of making a recording of an NES, and comparing it to the WAV output, and trying to get them to match as closely as possible. (You should record an NES an look at the waveforms. You will see that high pass filter effect, among other things.)

Some of the options are there to accommodate variations in the hardware. There are different models of NES, and different models of Famicom, clones, historical emulation differences, and all sorts of other things that become options.

Many of the options are described in nsfplay.txt, and others might require some research. If you want accuracy, use the defaults.

  1. I assume you're asking about playing an NSF on an NES? If the question is how to get an NSF file onto an NES cartridge, look up the RetroUSB PowerPak.
bbbradsmith commented 5 years ago

If you'd like some place to learn about this stuff, I would highly recommend asking at the NESDev forums instead of in my github issues. There are many people there who will be willing to answer questions about NES emulation, sound or otherwise:

https://forums.nesdev.com/

bbbradsmith commented 5 years ago

You also asked about bit depth, but I did already answer it above. 16-bit is more than enough to represent anything the NES can do. The noise floor for an NES varies from machine to machine, but its dynamic range is definitely not big enough to exceed 16-bits.

The "bit-depth" section of this video is probably the best explanation I've seen of this concept: https://xiph.org/video/vid2.shtml

The NES has various output sounds, each of which has some amount of bits of controls (4-bit square, 4-bit triangle, 7-bit sample playback) but like with the CPU clock rate this has almost nothing to do with the output bit depth you should use to represent the sound that comes out of the machine. These bit ranges of control are accounted for in the synthesis, but they don't at all correspond directly to bits of the output of a digital signal.

0xvividmirage commented 5 years ago

Thank you so much for the wealth of info youve provided. I did not mean for my line of questioning to become about teaching me about this topic but the quality of your answers has done so. Im much more informed now and will hit up NESDev for any further questions. Kind regards!