alsa-project / alsa-lib

The Advanced Linux Sound Architecture (ALSA) - library
GNU Lesser General Public License v2.1
360 stars 177 forks source link

Strange trigger_htstamp after snd_pcm_drop #387

Closed z-s-e closed 2 days ago

z-s-e commented 7 months ago

I have conceptually the following situation (actual code can be found here):

// sound is currently playing
snd_pcm_status(pcm, pcm_status);
snd_pcm_status_get_htstamp(pcm_status, &ts1);
snd_pcm_drop(pcm);
snd_pcm_status(pcm, pcm_status);
snd_pcm_status_get_trigger_htstamp(status, &ts2);

On my hardware I get a ts2 timestamp that is earlier than ts1. From my understanding that should be impossible, as ts2 should contain the timestamp of when the drop happened, which should obviously be after ts1. Is that a bug or do I misunderstand the meaning of those timestamps?

perexg commented 7 months ago

It looks like a bug. Could you show the plugin chain (using snd_pcm_dump or so) ? Or are you using direct hw:X device?

z-s-e commented 7 months ago

I'm using front:CARD=Generic_1,DEV=0

snd_pcm_dump gives:

Soft volume PCM
Control: PCM Playback Volume
min_dB: -51
max_dB: 0
resolution: 256
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate 
perexg commented 7 months ago

There should be more information (lines after Slave:).

z-s-e commented 7 months ago

ah yeah, sorry.

Soft volume PCM
Control: PCM Playback Volume
min_dB: -51
max_dB: 0
resolution: 256
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 32
  buffer_size  : 96000
  period_size  : 48000
  period_time  : 1000000
  tstamp_mode  : ENABLE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48000
  period_event : 0
  start_threshold  : 9223372036854775807
  stop_threshold   : 96000
  silence_threshold: 0
  silence_size : 0
  boundary     : 6755399441055744000
Slave: Hardware PCM card 1 'HD-Audio Generic' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : MMAP_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 32
  buffer_size  : 96000
  period_size  : 48000
  period_time  : 1000000
  tstamp_mode  : ENABLE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48000
  period_event : 0
  start_threshold  : 9223372036854775807
  stop_threshold   : 96000
  silence_threshold: 0
  silence_size : 0
  boundary     : 6755399441055744000
  appl_ptr     : 96000
  hw_ptr       : 47178
perexg commented 7 months ago

So it's basically equivalent to hw:X device so it appears like a kernel issue, but the kernel changes the trigger tstamp only if the state changes. It may be possible that the stream was already stopped for a reason (underrun or so). Could you check the stream state for the first snd_pcm_status call?

z-s-e commented 7 months ago

The code is called directly from a switch statement that checks for the state here, so this code gets only called when the last update is in SND_PCM_STATE_RUNNING but I also just re-checked it by verifying the value of snd_pcm_status_get_state right before the drop.

perexg commented 7 months ago

Is trigger timestamp identical before and after drop?

z-s-e commented 7 months ago

Good call, indeed they are the same (I assume you meant after the drop+update_status).

z-s-e commented 7 months ago

Well anyway, thanks for clarifying this is a kernel bug.

I can create a bug over at bugzilla.kernel.org if you want, but otherwise I guess I close this then as it is not a alsa-lib bug.

perexg commented 6 months ago

I'd like to keep this open until resolved.

z-s-e commented 6 months ago

Some more info: This happens on my integrated audio (ALC287 Analog) as well as when outputting to hdmi. I've also tested a usb interface, there it does not happen.

z-s-e commented 2 months ago

@perexg I do not have a kernel dev setup yet to test/verify, but looking at the code I might have an idea of what the problem is: It seems the hda_controller sets the trigger_tstamp_latched boolean at stream start, but it never gets reset. From my understanding the following may fix this:

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 4057f9f10aee..7ddc4dc9829d 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1196,6 +1196,7 @@ static void snd_pcm_trigger_tstamp(struct snd_pcm_substream *substream)
        if (runtime->trigger_master == substream) {
                if (!runtime->trigger_tstamp_latched)
                        snd_pcm_gettime(runtime, &runtime->trigger_tstamp);
+               runtime->trigger_tstamp_latched = false;
        } else {
                snd_pcm_trigger_tstamp(runtime->trigger_master);
                runtime->trigger_tstamp = runtime->trigger_master->runtime->trigger_tstamp;

Edit: Or alternatively, if the intention of that flag is to never get reset and make the driver be always responsible for the timestamps, this:

diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
index 5d86e5a9c814..2e7e3f73af3d 100644
--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -277,6 +277,8 @@ static int azx_pcm_trigger(struct snd_pcm_substream *substream, int cmd)
        snd_hdac_stream_sync_trigger(hstr, false, sbits, sync_reg);
        if (start)
                snd_hdac_stream_timecounter_init(hstr, sbits);
+       else
+               snd_pcm_gettime(hstr->substream->runtime, &hstr->substream->runtime->trigger_tstamp);
        spin_unlock(&bus->reg_lock);
        return 0;
 }
z-s-e commented 1 month ago

I can now confirm that my idea was correct. Attached is a reproducer, it can be verified under qemu with an ich9-intel-hda device.

Actually at first my fix didn't seem to work, but it turned out that the default alsa device contained dmix here, which does not call the SNDRV_PCM_IOCTL_DROP of the underlying device when dropping, so with dmix the timestamps are still broken even with my patch, just as a heads-up.

I had one more idea for an alternate fix; just remove this trigger_tstamp_latched stuff. The hda_controller is the only driver using that, and while I understand the timestamp under the spinlock there can potentially be more accurate, I am not so sure it is worth it. In qemu I measured around 300 nanoseconds difference between the timestamp taken in the driver vs the timestamp in snd_pcm_trigger_tstamp, so unless one gets unlucky and there is some preemption in-between it should be negligible (EDIT 2: after reading the code some more, there cannot be any preemption as the stream lock is held, so that bus->reg_lock spinlock seems entirely unrelated to the timestamp).

Anyway, I'd like to get started contributing to the kernel, so this seems like a good opportunity. Could you tell me which fix you prefer, so I can prepare a patch for the ML?

z-s-e commented 1 month ago

I'll just go ahead with proposing the removal. That appears to me to be the most robust solution, and personally I am always in favor of removing unneeded complexity.

perexg commented 1 month ago

Latched timestamps were added in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2b79d7a6bf341ebeffb85fc2b3577d0cfa810c49 . Just fix the trigger callback in hda_controller.c.

z-s-e commented 1 month ago

Yes, I've seen that commit already. Just adding that to the hda_controller might not fix it for other uses of snd_hdac_stream_timecounter_init (which is the place that raises that flag, and grepping shows it is also called by sound/soc/intel/skylake/skl-pcm.c), so that seems like a more risky fix that potentially needs to be kept in mind for all hda drivers, that is why I rather would remove it, especially since it does not seem to provide any real value.

perexg commented 1 month ago

Please, sent a proposed commit with a good description to linux-sound@vger.linux.org mailing list with Cc to all relevant maintainers of affected drivers (including sound/soc/intel tree). Here is not a right place to discuss this removal.

z-s-e commented 1 month ago

Yep, thanks.

perexg commented 1 month ago

BTW: Another way may be to add a new start argument to snd_hdac_stream_timecounter_init and do just timestamp when this argument is false. This will catch all hda_controller code uses.

z-s-e commented 1 month ago

Right. Personally I still prefer the removal, but if the consensus on the ML will be to not remove this after all, I'll change it to the agreed solution then.

perexg commented 2 days ago

Resolved in https://github.com/tiwai/sound/commit/df5215618fbe425875336d3a2d31bd599ae8c401 .