Proposal: speed advisories

theDanielJLewis commented 3 years ago

I'm a "podspeeder." I listen to most podcasts at 2.5× with the silence-truncating feature enabled. I adjust this on a per-podcast basis when necessary. For example, I listen to comedy without silence-truncation and at slower speeds.

But I run into a problem where a podcast I can normally consume just fine at my preferred speeds has sections that I can't understand unless they're slowed down to normal. Such spots are usually from accents, poor audio quality, music I need to hear (not bumpers), or other complicated audio (like crosstalk or not-background-enough music).

Thus, I wonder if it might help to allow podcasters to add a speed advisory to their podcasts and chapters. Then, the playback apps would have an optional toggle to honor these speed advisories. When such an app reaches that chapter (or other indicator), it temporarily slows down the playback.

I think relative terms would be best, like "slower." We could advise that such sections be played at 50% of the difference between 1.0× and the current playback speed. This, a "slower" on my 2.5× would result in a temporary drop to 1.75×.

While there could be a case for "slowest" to drop down to 1.0× or maybe 25% of the difference, I don't think there would be much (if any) need for a "faster" option, even though I've heard some unbelievably slow-talkers in some podcast voicemails.

I know Adam is adamantly against playback speed changes, but it's here to stay and a lot of listeners find it easier and often more enjoyable to consume content faster. So I think it might be worth considering making this experience a little better for listeners.

I think this would be most logical to include inside chapters, since we can already have hidden chapters.

One important challenge to this is potential abuse by podcasters trying to force people to listen to the whole show at slower speeds, or slowing down ads. But this is why I think it should be an advisory and apps should allow such advisories to be honored or ignored app-wide or podcast-specific.

I think "speedAdvisory": "slower" would be the ideal JSON option.

daveajones commented 3 years ago

I like this a lot. A chapter label for “ad” is probably appropriate as well. Even if it never gets used it seems good to have for a complete spec.

Most apps will have pre-set speed options. I wonder if “slower” could just mean “temporarily drop down one notch unless it’s already at 1.0”. I can see this being useful for intro and outro music where the music sounds screwy at fast speed.

bjoreman commented 3 years ago

I, too, really like the sound of this. I like it being relative too, but perhaps there's a case for explicitly recommending default speed too? I.e. for a music segment it feels like I want default speed or my regular speed, but something inbetween won't give me much of anything.

Then again, perhaps it's better to just leave it up to the app to define what exactly slower means.

AbelLykens commented 3 years ago

Main reason I would like this is playing music-parts at original 1.0x speed. Not even sure if that needs an advisory [?], my podcast player would just need to know the chapter/part is type "music". In my podcast player I would say: play at 1.0x speed for chapters tagged "music". Not yet sure how to handle accents. Maybe parts would be tagged "clip" and I would set 1.5x speed for all clips?

theDanielJLewis commented 3 years ago

I like that idea, @AbelLykens! Instead of a speed advisory, it could be a chapter type and apps could have preset speed changes for chapter types, or maybe some kind of "speed for" tag.

For example, my player could default to 2.5× for everything, but play all "music" chapters play at 1.0×, all "heavy accent" chapters play at 1.75×, and such.

We need to be careful with anything that could allow an app to automatically skip or speed through ads. Podcasters would hate that and I think it would draw too much negative controversy and thus inspire developers to not even bother supporting a tag or feature that could be used to hurt podcasters.

keunes commented 3 years ago

From a client end-user perspective I would not like the creator to determine the speeds I listen my podcasts at. What is 'complicated audio' or 'heavy accent' is highly subjective.

For me the only valid use-case is the identification of music-fragments (including intro/outro). I would therefore (as proposed above) go for a content type identification (e.g. 'music', 'intro', 'outro', 'clip', etc) rather than any indication of suggested speeds (e.g. 'slower') or subjective classification (e.g. 'heavy accent').

theDanielJLewis commented 3 years ago

From a client end-user perspective I would not like the creator to determine the speeds I listen my podcasts at. What is 'complicated audio' or 'heavy accent' is highly subjective.

The point is for this to be an advisory and the consumer could tell their app to ignore speed advisories.

keunes commented 3 years ago

The point is for this to be an advisory and the consumer could tell their app to ignore speed advisories.

Agreed. The question is whether we (the podcaster) advises

which speeds the user should follow (tags indicating the speeds), or
which segments benefit from a different speeds (tags indicating the segment types), or
a combination of both.

I think the having at least the segment types will be easier for the client/end-user to handle. Because it is so subjective, I might want to follow the advisory regarding the music segment, but ignore the advisory on the heavy accent segment. If the tag only provides the speeds (without the segment types) that'll be impossible to achieve.

TL;DR I like @AbelLykens proposal as it gives users more fine-grained control.

A combination of both would also be interesting.

ebcarty commented 1 year ago

Would this be the appropriate issue to support setting the playback speed?

Snag_480b8041

ryan-lp commented 1 year ago

Circling back to this proposal now that transcripts are becoming more widespread, I would suggest that we can actually infer speed changes from transcript timestamps. The rate of speech in a chapter or a segment can easily be measured in words per minute (wpm) by dividing the number of words spoken by the duration of the span in question.

Pros:

Doesn't require a new tag or attribute, so it helps to keep the tag bloating under control
Works with JSON, SRT and VTT transcripts (basically, any transcript format that has start/end timestamps)
Would provide a more objective measure of the actual rate of speech than if we let the podcast author subjectively assign a speed themselves, since what feels slow to one person may feel fast to another person
Apps could translate the wpm into the proposed "slower" / "normal" / "faster" types of categories if they wanted to, but could do one better by automatically adjusting the audio precisely to your desired wpm. E.g. If I can cope with listening at 300wpm, but this segment is 150wpm, it can automatically 2x the speed. If another segment is 200wpm, it can 1.5x the speed. This could also be a point of app differentiation in the market.

Cons:

Depends on having transcripts (although the proposed approach depends on having chapters anyway, and maybe it's good to have another incentive to actually provide transcripts for all the new possibilities they open)
Depends on the app implementing it (although so does the proposed approach)
Still requires manual control if you could normally cope with 300wpm except for a particular science podcast with lots of condensed jargon (although I think manual control should be an option in every alternative)

Note to implementers: wpm is often calculated in practice by counting the number of characters, and dividing that by the average number of characters in a word (in English, that's 5 characters per word on average). This is a bit more robust against sentences that by chance happen to have lots of big long words in it, and it makes the calculation more straightforward for some other languages that don't use spaces for word separation.

jamescridland commented 1 year ago

wpm is often calculated in practice by counting the number of characters, and dividing that by the average number of characters in a word

If it helps - for English, three words a second is a pretty good approximation. Most radio ad copywriters will write 90 words for a 30-second ad. Source: I was a radio ad copywriter for over eight years.

I'm really liking this as an idea, btw. It, of course, depends on transcripts.

theDanielJLewis commented 9 months ago

"The rate of speech in a chapter or a segment can easily be measured in words per minute (wpm) by dividing the number of words spoken by the duration of the span in question."

That's an interesting idea, but I'm not sure it really serves the purpose. My goal is not to advise a speed based on how fast someone is talking, but based on the content.

Consider, for example (but I'm not recommending this), a non-regular participant with a thick "accent" (to the ear of the listener) might be very difficult to understand at 2×, but the normal host can be understood fine at 2×. So the transcript of the two speakers might have the same number of words per minute, but the understandability is significantly different.

More practically, though, consider the words per minute of music, which is usually much lower than speaking. Transcript-based speed advisories might then recommend speeding up the music, when I'm suggesting the podcaster should be able to advice slowing down the playback during times to focus on the music.

I understand that even the whole concept of podspeeding angers some people. However, people do it anyway (and some people can actually comprehend and retain the content better when they listen faster). So my desire in this proposal is a way to ensure their preferred experience doesn't break when music becomes a temporary focus during a portion of an episode.

Podcastindex-org / podcast-namespace

Proposal: speed advisories #144