Expose discrete voices in API

mmontag commented 2 years ago

I have one big feature request for libvgm, and that is improved support for voices/channels in the API (I use the term "voice" to disambiguate from left/right stereo channels).

Device names/voice names

It would be nice for the API to expose the active (1) device/chip names, like YM2608, SegaPCM, etc. and (2) voice names, like FM 1, FM 2, PSG 1, etc. It helps to have friendly names for muting. Game music emu has basic support for this, but it is not implemented for all the FM chips.

Voice buffers (for oscilloscopes, etc.)

In addition to the stereo buffer, I would love to be able to fill discrete voice audio buffers for external mixing or visualization. The host would be responsible for any post-processing like autocorrelation, etc.

ValleyBell commented 2 years ago

Device names are exposed using the SndEmu_GetDevName call (emu/SoundEmu.h).

For voice names I could add an additional function to the DEV_DEF::rwFuncs list, I guess.

Do you have any suggestions for how the API call for separate voice buffers should work / look? In the most straight-forward way you either call the "stereo" renderer or the "separate voices" renderer, I assume? (Having a stereo renderer and simultaneously outputting per-voice data seems difficult to me.)

mmontag commented 2 years ago

Yeah, choosing one or the other is not ideal now that I think of it. That would force the host to do the stereo mix, which would require panning information and doesn't seem practical.

Maybe you always provide stereo output in the first 2 channels, and the optional per-voice output in subsequent channels. I agree it wouldn't really work to call two separate methods.

Maybe something like:

host loads song file and asks for voice info/num voices in use
host allocates buffer for (2 channel stereo + num voices) * sample length
host calls render, providing buffer, sample length, and an additional argument specifying N discrete voices to fill

Buffer layout: [----L----|----R----|--voice1--|--voice2--|...|--voiceN--]

I was trying to figure out how Modizer added multi voice output to vgmplay: https://github.com/yoyofr/modizer/commit/cf117601dd9909b1635b99650df6537f0450b92c#diff-7390462695712dbe8460edcaab01bd6597b9d920f20a15211b0172acc8ceb4df

superctr commented 2 years ago

I disagree with this method, as it would require modifications to the emulation cores, that would have be inconsistently applied and at the very least bypass the emulated chip's mixer. That could cause accuracy issues with the sound chips that perform certain effects during mixing (YM2612, QSound that I can think of on top of my head right now, possibly others).

superctr commented 2 years ago

A few more reasons that I can think of:

Buffer size increase, which I think is a bad thing, due to
Additional CPU time required to fill the buffers, and even to check if the buffer is big enough for the extra channels. This would apply even for applications that don't need the individual voice outputs.
Chips that can be configured to set the number of voices will still need a buffer big enough to handle the maximum number of voices.
Consider that many sound chips have "soft panning", in order to accurately capture a voice you'd need to separately capture all channels that it can output too.

As a worst case, you can think of the YMF278B (OPL4), which has up to 45 voices and 6 output channels.

Personally, I'd rather keep this off the library and have the application deal with creating outputs for each channel, either by multiple chip instances or multiplexing (i.e. save state on the "main" instance and replay on a "sub" instance for each voice you want to capture).

mmontag commented 2 years ago

Hmm...you raise some good points.

I would suggest, in principle:

Accuracy of discrete voice channels is not critical.
Accuracy of the stereo (primary) output can remain undisturbed. In other words, if a core is modified, only the discrete voice output bypasses the internal mixer.
Limits and compromises are acceptable (i.e. 32 voice max, not all chips supported, voices get downsampled 4x, etc.)

I think some enhancement to the cores is defensible if the purpose of the library is music playback, based on user demand. My hope was that libvgm could be a common foundation for players like Modizer or Rymcast. Currently such players do their own hacks to obtain a voice visualization. But yes, it depends on the goals of the library. :)

superctr commented 2 years ago

I think some enhancement to the cores is defensible if the purpose of the library is music playback

The changes you're describing aren't related to playback though. You want music visualization. I don't think that adding bloat deep into the cores will enhance playback in any level.

To further clarify what i think, the individual voice output code will need to be added to each core, adapted to the mixing/channel update routines (which are written in many different ways), and of course as mentioned will increase memory and CPU usage. It will be a burden to maintain, all cores would have to be adapted and new cores/ports from other emulators will be as well.

Also consider the likelihood that it will be misused (ie not for visualization).

I think if this function is absolutely necessary in your application, that it would be best to keep it in a separate branch or fork.

ValleyBell commented 2 years ago

While I'm not completely against adding functions for visualization or separate channel output, I won't put any effort into it anytime soon.

I'm also not convinced about modifying the update/render function to provide additional parameters.

Right now I think that functions that provide the volume / frequency of the channels/voices would be more useful and feasable than additional per-voice output.

mmontag commented 2 years ago

Thanks for the discussion here. I think we're in agreement that the best option would be to maintain a fork with voice output support.

meyertime commented 2 years ago

My question is related to this, so I thought I would ask here instead of starting a new issue.

I would like to access each voice separately too, but for a different purpose. Rather than visualizing, I'd like to render them to separate files. At this point, even a manual process would do. For example, is there any way to configure some voices to be muted? Particularly with the vgm2wav tool.

I found this and got it to work: https://github.com/weinerjm/vgm2wav However, it lumps all the SN76489 voices together, and it appears to be a quick thing someone threw together, whereas this project looks much more mature. I played around with VGMPlay, which appears to be the predecessor to this project, and that appears to have options in the .ini file to mute channels, but I can't get it to play anything in Linux, and even if I did, I'm trying to save it to a file rather than play it. The included vgm2wav tool works on Linux, but it doesn't read the .ini file and has hardly any command line options.

ValleyBell commented 2 years ago

If you want to have a more "programmable" solution, you can look how player.cpp does it here: https://github.com/ValleyBell/libvgm/blob/57713471eef1db49e84f39d4ee3ac83662f01316/player.cpp#L703

If you just want something that works out-of-the-box, then compile vgmplay-libvgm, which uses libvgm. (It uses libvgm internally and includes it as a submodule, so that you can make sure the versions are compatible.)

meyertime commented 2 years ago

I was not able to get vgmplay-libvgm to build, but I was able to modify vgm2wav to take a --voice parameter and mute all but the indicated voice using the SetDeviceMuting example you linked to. It's enough to script out what I need. Thanks!

ValleyBell / libvgm

Expose discrete voices in API #80

Device names/voice names

Voice buffers (for oscilloscopes, etc.)