Microphone interface detected, but microphone not captured, by games when using ALSA

ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components

Other

23.97k stars 1.05k forks source link

Microphone interface detected, but microphone not captured, by games when using ALSA #6600

Open aclist opened 1 year ago

aclist commented 1 year ago

Background: using an .asoundrc file on an ALSA-only system to specify a USB microphone interface. Microphone input works correctly system-wide in Linux, Steam voice calls within the Linux client, audio recording software, OBS, etc.

The ALSA config file or output of alsactl can be attached if necessary. The config file adds the microphone, duplex, mixer control, loopback support, and the dsnoop plugin, which allows multiple programs to capture audio simultaneously.

System/kernel: 6.2.2-arch1-1

Proton version: Experimental, 7.0-6, all previous versions

Steps taken:

Allow a wine prefix to be created for the appid in question.
Use winetricks in the prefix with the verb sound=alsa to update the registry.
Open winecfg, navigate to Audio tab, and open the dropdown dialog for Voice input device
Verify that the microphone interface is present (in this case, "Scarlett Solo USB") and select it from the dropdown
Launch the game with Proton and attempt to use voice input, either via in-game microphone tests or communication with players

Microphone input is never passed through to the game, even in games reported elsewhere as having microphone support in Proton.

A sample of games tested, alpha order:

Dark and Darker Demo (2258570)
Dark and Darker Playtest (2078890)
Day of Infamy (447820)
DayZ (221100)
Deep Rock Galactic (548430)
HELLDIVERS™ (394510)
Treason (1786950)
Unfortunate Spacemen (408900) (possibly related to #4095)
We Were Here (582500)

I was unable to get microphone input in any of these games (with one caveat for Day of Infamy below), although the microphone interface itself was detected in all of them.

As a control, I tested adding the Windows version of Audacity as a Non-Steam Game and forcing Proton (Experimental, 7.0-6, other versions) as a compatibility tool using the same steps above (set input device in winecfg, etc.), then launched it and selected the microphone interface within Audacity and recorded a sound sample and played it back. This DID capture audio despite the application running inside Proton.

My initial surmises were that there are additional layers of middleware present in games, whereas Audacity captured the raw microphone input. However, two other interesting data points follow:

Day of Infamy uses the Source engine and reverts to default Source menus under the hood for some options. When opening the microphone configuration through Day of Infamy's reskinned settings frontend, it triggers a vanilla Source dialog for microphone test/input selection, the same seen in games like Half-Life and Counter-Strike 1.6. Within this menu, the interface was also detected and selectable, and the microphone test DID capture sound. However, this did not carry over once within the game.
Treason uses the Source engine and does not reskin the menus, using them by default. When setting voice options, it opens the same menu as item 1 above, but in this case, the microphone interface was detected, but audio was not captured by the microphone test.

Even allowing for different audio capture middleware (some games use, e.g., Vivox), the sample size of games tested is quite large, yet I was unable to get it working in any of them, except for the Audacity control test. I would like to know what is the missing step here or what can be done to extend support, since for all intents and purposes the interface and the audio do seem to be reaching inside the Wine prefix.

My last guess here would be that this is adjacent to https://github.com/ValveSoftware/steam-runtime/issues/344, insofar as Proton runs in a container, and that the dmix/dsnoop plugins maybe cannot penetrate the container correctly, so if you advise testing with a stripped-down .asoundrc, I can do that as well.

However, I maintain that, per the tests above, the interface and audio are reaching most, if not all games, already. And needless to say, audio playback itself works fine in every game despite this configuration.

kisak-valve commented 1 year ago

Hello @aclist, as a basic test, do any of these games behave as you expected with Proton 5.0?

aclist commented 1 year ago

Thanks for your reply.

Using Proton 5.0 will probably limit the test surface a lot due to the fact that some games simply don't run. However, perhaps we can get a baseline by finding a game that definitively runs with Proton 5.0 up through current and which definitively works with microphone on other systems/PulseAudio. Day of Infamy seems like a good candidate.

A summary of the same games with 5.0:

Dark and Darker Demo: the demo/playtest is not currently open, disqualified
Dark and Darker Playtest: same as above
Day of Infamy: runs, Source engine audio device scan finds microphone and other devices, microphone test works, detects mic input levels. Testing in-game will require another test subject, so confirmation is pending
DayZ: requires Proton 6.x or higher to play online
Deep Rock Galactic: does not run with 5.0. Notably, the voice input section of the menu is grayed out in-game when using more recent Proton versions
HELLDIVERS: not tested
Treason: does not run with 5.0
Unfortunate Spacemen: unrelated problem: running into the dreaded "Not available on your platform" when trying to install this free game, despite setting a Proton version globally for all non-supported titles. (I have seen this bug before, but was under the impression it was fixed.)
We Were Here: same "Not available on your platform" bug as above.

As another data point, it seems the initial "control" test was flawed. I tried Audacity as a Non-Steam Game run through Proton again, this time without changing winecfg or explicitly setting the microphone interface at all, and it detected all audio I/O on the system despite using the Windows version. Once selecting the right I/O inside of Audacity, you can record and playback audio. It seems Audacity can probe all ALSA devices already when they are exposed by Wine, the main difference being that it lets the user select each I/O inside the app.

So this brings us to two scenarios:

The application probes all devices exposed by Wine and lets you granularly select the I/O you need. Day of Infamy actually seems to fall into this category, as the Source audio settings dialog scans all of the cards and their addresses and shows them in a list for you to select. In theory, this is a strong candidate for getting voice input into the game; I just need to test it again on different Proton versions in-game with someone to confirm that the audio is going through. In my last test on bleeding-edge Proton, though, I was not audible in-game.
The application expects the "default input device" (device 0) to be the microphone and exposes no granular config menu. Most games seem to fall into this category, which is why it's important (as far as I can tell) to explicitly specify the input device in winecfg. Out of the box, this defaults to "system default," which is generally the default audio playback device, not input device. In a multi-card system or system with audio through HDMI, loopback, etc., "device 0" is highly unlikely to be both the playback and input device.

Very few games offer granular control of audio devices inside the game, and they expect Windows to have handled that already. My microphone interface corresponds to card 3, so it's never going to be the default device without significant rejigging of the card indexing, so being able to gracefully change this on the Wine frontend is obviously preferable.

I will report back once I test Day of Infamy live in-game.

smcv commented 1 year ago

My last guess here would be that this is adjacent to https://github.com/ValveSoftware/steam-runtime/issues/344

It could be closely related. Audio capture and audio output are two sides of the same coin.

As with audio output, the most likely thing to work is the setup that is tested most often, because it's the setup that most Linux desktop systems have: a PulseAudio-compatible sound server (PulseAudio itself or pipewire-pulse) on the host system, with the sound server's AF_UNIX socket shared with the container, and programs inside the container connecting to that socket. The Steam Linux Runtime does not intentionally break "plain ALSA", but can't guarantee that it will work either, because whether it will work depends on a lot of implementation details.

If the Windows version of Audacity can successfully output and capture audio, then that probably implies that enough is getting through to Proton that this can work, which would make this more of a Proton issue than a container runtime issue: Proton is providing the audio devices to applications, just not in exactly the form that is most convenient for naive audio middleware implementations with no ability to select non-default devices.

However, Proton/Wine is also much more heavily tested with a PulseAudio-compatible sound server than with "plain ALSA", so it is more likely to present the default PulseAudio capture source to games' audio middleware in the most convenient way than it is to be able to do the equivalent thing with plain ALSA.

The ALSA config file or output of alsactl can be attached if necessary

Yes, please do provide the configuration. "Plain ALSA" can mean almost anything: the ALSA configuration language is extensive, and quite possibly Turing-complete.

the dmix/dsnoop plugins

So far, nobody has been able to describe to the Steam Linux Runtime maintainers exactly how these plugins work behind the scenes (what IPC channels they use between the cooperating processes that share the audio input and/or output device), and we haven't been able to find documentation, which means we can't really guess why they don't work reliably across the container boundary. If you are sufficiently knowledgeable about dmix/dsnoop that you have reasons to prefer them over PulseAudio, perhaps you can help by describing how these plugins actually work?

To give you an idea of the level of detail that we'd need, the equivalent information for PulseAudio is:

the server listens on an AF_UNIX socket
the socket can be located using various environment variables and configuration files which I won't go into here, but flatpak_run_add_pulseaudio_args() in the code that we borrow from Flatpak has the full details (and we probably do need that level of detail!)
each client connects to the socket
none of the clients open the actual audio device nodes in /dev directly; only the server does that

aclist commented 1 year ago

@smcv

Thanks for your message and willingness to entertain this as a legacy support issue. I know that ALSA documentation can seem byzantine at the best of times.

First, some clerical issues:

Day of Infamy

I was unable to get microphone audible in-game when using Proton 5.0. I understand that versions 5.0 and prior do not incorporate pressure-vessel, but the absence of pressure-vessel does not seem to be a sufficient condition to getting the mic to work here.

Deep Rock Galactic

I am striking this from the list of candidates after testing it further and reading ProtonDB reports, as it seems this game never had microphone support (seems to use some middleware), so the microphone will always be disabled in the menu.

OK, onto your message:

Proton is providing the audio devices to applications, just not in exactly the form that is most convenient for naive audio middleware implementations with no ability to select non-default devices.

Yes, I agree with that description.

However, Proton/Wine is also much more heavily tested with a PulseAudio-compatible sound server than with "plain ALSA", so it is more likely to present the default PulseAudio capture source to games' audio middleware in the most convenient way than it is to be able to do the equivalent thing with plain ALSA.

Yes, I understand this. To give you some background, I elaborated my .asoundrc at a time when PulseAudio still felt lacking and PipeWire was still very bleeding edge. Adding loopback support (such as for the purposes of streaming desktop audio alongside a microphone), mixing of streams, a microphone, software mixers, etc., in PulseAudio was extremely difficult without something like JACK in between. The advantages of ALSA were:

All devices can be described and bound to each other in a single, portable flat file
Less overhead and moving parts
Essentially worked out of the box, provided the .asoundrc was well-formed
Granular control of certain device-level parameters like sample rate without the need to introduce additional utilities

Out of curiosity and apropos of this thread, I tried installing pipewire and pipewire-pulse onto the system, since your point about using a predictable IPC socket is totally reasonable. Predictably, it works well and microphone was working OOTB in every game.

perhaps you can help by describing how these plugins actually work?

dmix (for mixing multiple streams in software before passing them onto the hardware device), and its counterpart dsnoop (for capturing multiple sources), use semaphore arrays for IPC.

When an application is using the plugin, you can see active semaphores by invoking ipcs -s:

key        semid      owner      perms      nsems
0x00000801 655363     me         666        1

The key entry is a hexadecimal representation of a unique IPC key, a discrete integer that is defined ahead-of-time by the user in their config file, e.g.:

pcm.dsnooper {
    type dsnoop
    ipc_key 2049
    ipc_perm 0666
    slave.pcm "snd_card"
    slave
    {
       period_time 0
       period_size 4096
       channels 2
    }
    bindings {
        0 0
        1 1
    }
}

The semid is the semaphore array corresponding to that key.

In the config snippet above, the dsnooper device (an arbitrary name) using the dsnoop plugin is assigned to the ipc_key 2049, an arbitrary integer. The only requirement here is that when defining different slave devices, they have unique IPC keys to prevent collisions. The same key may be used if multiple dmix devices are accessing the same hardware. This is followed by the ipc_perm key, which sets an access mask, similar to a file mask. 0666 is used so that other users/processes can interact with this device.

Usually, a process opening a literal ALSA hardware device binds exclusively to that device. dmix/dsnoop create an alias that is itself bound exclusively to the hardware device, but which exposes an entry point that multiple processes can use at the same time, in lieu of the hardware. The processes talk to the dmix device and (presumably) the semaphore array is updated with their process IDs, with the final audio mix being sent to the hardware.

ipcs -i <SEMID> -s will return the contents of the semaphore array, the number of elements (nsems), and their pids. I assume semaphore arrays are used over semaphores here because of the need to wrap multiple sources into the mix at once.

A more straightforward example of defining an IPC key and a slave device:

pcm.microphone {
    ipc_key 1027
    type dsnoop
    slave.pcm {
        type hw
        card "USB"
        device 0
    }
}

Because the IPC keys are defined by the user and initiated on an ad-hoc basis, they can be any unique integer and are not bound to a monolithic address per se. Outside of parsing the config file or iterating through system semaphores, I do not know of a way to query ALSA to tell us which IPC keys are reserved, since the allocations occur at runtime. Provided the .asoundrc is well-formed and the permissions are set correctly, these virtual devices should be exposed to other processes.

Not sure if this answers your question.

smcv commented 1 year ago

Thanks, that's useful. ipcs(1) is about System V IPC, which is governed by ipc_namespaces(7). pressure-vessel doesn't unshare the IPC namespace (because it doesn't intend to be a security boundary), so for that part at least, it shouldn't interfere.

The per-game container doesn't necessarily have the same version of libasound.so.2 as the host system (that's most of the point of pressure-vessel, it's meant to separate the game's library stack from the host system's), so if the IPC protocol involving those semaphores isn't compatible across all versions of libasound.so.2, then it won't work reliably.

The processes talk to the dmix device and (presumably) the semaphore array is updated with their process IDs, with the final audio mix being sent to the hardware

Do you know how they do this, or what the "dmix device" is, behind the scenes? Is it a shared memory region, or an AF_UNIX socket, or a dynamically-created device node in /dev, or something else?

If only one process at a time is allowed to have the actual device node in /dev/snd/ open, then logically dmix must work by having one process (perhaps whichever one won the race to be the first to output audio, or perhaps something more complicated) open the device node for writing, and all other cooperating processes sending their audio data (somehow) to that one process, so that it can do the mixing and the actual writes to the hardware. Similarly, dsnoop must work by having one process open the device node for reading, and all other cooperating processes ask that process to send them a copy of the audio data from the microphone.

In PulseAudio or Pipewire, the design is asymmetric: the one process that holds the device node open is special (it's the sound server), and all user-facing applications (like games) send playback streams to the sound server or receive recording streams from the sound server. In dmix and dsnoop, the design seems to be symmetrical (none of the processes involved is special), which presumably must mean that the cooperating processes that are using dmix/dsnoop have to elect one of those processes to be responsible for taking on the sound-server-like role, in addition to doing whatever it was already doing?

smcv commented 1 year ago

Out of curiosity and apropos of this thread, I tried installing pipewire and pipewire-pulse onto the system, since your point about using a predictable IPC socket is totally reasonable. Predictably, it works well and microphone was working OOTB in every game.

OK, so that's a straightforward workaround for this for most people: if you have a PulseAudio-compatible sound server as recommended (either PulseAudio or pipewire-pulse), then Proton's audio works well. And the rest of this issue is only relevant to people who aren't running a PulseAudio-compatible sound server, for whatever reason.

Day of Infamy: I was unable to get microphone audible in-game when using Proton 5.0 [and ALSA dsnoop]

In that case there might be a Proton issue with its handling of ALSA dsnoop, but it's unlikely to be a Steam Linux Runtime issue, because Proton 5.0 didn't use SLR. So Proton developers might still be interested in this part of the issue, but from my point of view as a Steam Linux Runtime maintainer, I can ignore this game.

aclist commented 1 year ago

Do you know how they do this, or what the "dmix device" is, behind the scenes? Is it a shared memory region, or an AF_UNIX socket, or a dynamically-created device node in /dev, or something else?

If only one process at a time is allowed to have the actual device node in /dev/snd/ open, then logically dmix must work by having one process (perhaps whichever one won the race to be the first to output audio, or perhaps something more complicated) open the device node for writing, and all other cooperating processes sending their audio data (somehow) to that one process, so that it can do the mixing and the actual writes to the hardware. Similarly, dsnoop must work by having one process open the device node for reading, and all other cooperating processes ask that process to send them a copy of the audio data from the microphone.

dmix uses a combination of semaphores and shared memory. I don't think a race condition is an issue here because dmix doesn't reach the sound driver or /dev/snd or /dev/dsp proper. ALSA implements its own server-client model using something called aserver for the former and aoss for the latter (emulating the OSS driver). I think I may have confused this point by talking about "hardware," but the "hardware devices" dmix communicates with are the PCM device abstractions enumerated by ALSA. When an ALSA-compatible application is launched with dmix, it will use the IPC keys to perform a lookup and get the shared memory segment, if applicable. If not, it creates one. The audio mix goes to the "hardware" (PCM slave) set in the config file, then ALSA is responsible for binding to /dev.

That's about the extent of my knowledge of it, although the below are good resources, albeit disorganized, and the alsa-devel mailing list is pretty active, so I'm sure someone on there could give a more authoritative explanation.

https://www.alsa-project.org/alsa-doc/alsa-lib/index.html https://alsa.opensrc.org/DmixPlugin

The per-game container doesn't necessarily have the same version of libasound.so.2 as the host system (that's most of the point of pressure-vessel, it's meant to separate the game's library stack from the host system's), so if the IPC protocol involving those semaphores isn't compatible across all versions of libasound.so.2, then it won't work reliably.

This has me wondering if the end-user could add a copy of their system shared libraries as a brute force override. This is obviously not a sane solution from the standpoint of Proton development, but it could be an "at your own risk" workaround for a user sufficiently comfortable with making that change.

smcv commented 1 year ago

This has me wondering if the end-user could add a copy of their system shared libraries as a brute force override

tl;dr: no, not really.

This is a much larger "why can't you just" than you would think. We already do this for graphics drivers and their (relatively minimal) dependencies, and getting those working mostly reliably for most people has taken 5 years. For each library that we give this treatment, we have to be able to determine at runtime, with basically 100% reliability, whether the version on the host system or the version in the container is newer: because if we get it wrong, games are going to crash with cryptic errors about missing symbols.

libasound.so.2 isn't a well-behaved library with a minor API/ABI version in its name (like, say, SDL or GTK): both the host and the container version say they are version 2.0.0, and probably always will forever. This means we would have to guess which one is newer by counting symbols.

libasound.so.2 also has a relatively complicated plugin architecture with an elaborate (and possibly Turing-complete) configuration format, so we would need to do the same for all of the plugins on the host system, and their dependencies, recursively. Some of those dependencies are exactly the sort of large application-level libraries where the whole point of the container runtime is to insulate games from the host system's exact choice of versions: for example libasound_module_pcm_a52.so pulls in all of FFmpeg, and libasound_module_pcm_jack.so pulls in libdb-5.3.so. Some of these libraries can have very strange action-at-a-distance that you certainly wouldn't associate with audio.

When the necessary infrastructure to get audio streams across the container boundary in a convenient way already exists, and is already in use by most Steam users (it's PulseAudio or Pipewire), the more robust choice is obvious.

smcv commented 1 year ago

ALSA implements its own server-client model using something called aserver

This illustrates another large part of the problem with issues talking about "ALSA": people mean many different things when they say that.

To kernel developers, /dev/snd is the ALSA kernel interface. PulseAudio, Pipewire, and basically any other practical audio system on Linux rely on that.

To game and application developers, the user-space client library libasound.so.2 is ALSA; but because of its plugin architecture, what it ends up doing behind the scenes can be almost anything. Most commonly, that'll be a plugin that sets up audio streams to/from PulseAudio and Pipewire (which are container-friendly, because they have a protocol with cross-compatible feature negotiation), but less commonly a plugin that talks to JACK (which can't be container-friendly in the same way, because its IPC protocol is specifically not cross-compatible between versions), or directly to the hardware, or dmix/dsnoop, or whatever.

The PulseAudio and Pipewire sound servers also use libasound.so.2, but they turn off 95% of it (basically the entire plugin architecture), and just use it to talk to /dev/snd. (Obviously if they allowed it to use its plugin architecture in the usual way, they'd just connect to the sound server, which isn't going to work when they are the sound server.)

And then when an end user says "ALSA", often what they mean is a more complicated configuration in .asoundrc, which again could do almost anything, but in practice often dmix or dsnoop.

For the container runtime (which needs to deal with precise technical details), all of these are different, which makes "ALSA" a tricky topic to deal with even if the technical details were easy, because step 1 is to understand which "ALSA" is the one this issue is talking about.

aclist commented 1 year ago

Thanks for the explanation. At this point, I'm not hell-bent on using ALSA, since pipewire proved to be a totally functional drop-in replacement. I've actually been troubleshooting this issue for two years, so I'm happy that everything just works once I embraced modernity. It's reasonable to assume that you need a relatively modern (relative to ALSA's vintage) system to play games or use any recent flavor of Wine/Proton, and such a system will undoubtedly have at least PulseAudio.

I realized that I never posted the config file in its entirety. I'm attaching it here for posterity. Obviously, it is very fancy. If dsnoop is the culprit, it would be at least academically interesting to see whether a barebones config (single playback card, single capture device, no mixing) would be the minimum viable configuration for microphone passthrough, since Proton seems to support basic ALSA functionality in all other respects. I have seen anecdotal reports that a very basic asound.conf does work with a microphone. I'll test this on another system when I have time.

asoundrc.txt

I'm not sure if a system where you can only play back or record a single audio stream at a time is a realistic use case (most people probably expect to be able to play a video and also receive sound notifications in the background), but maybe it would help somebody in isolated scenarios (a kiosk device set up to only play games?), or set a baseline for troubleshooting future tickets.