Podcastindex-org / podcast-namespace

A wholistic rss namespace for podcasting
Creative Commons Zero v1.0 Universal
380 stars 114 forks source link

Attribution for podcast directories and webapps - audio query string proposal #86

Closed jamescridland closed 3 years ago

jamescridland commented 3 years ago

This isn't, per se, anything for an RSS namespace: but I'd be keen to at least float this idea and see what happens. If there's a better place for this - I feel there must be - then I'd be really grateful for a pointer. It is, however, something that needs agreement and standardising somewhere; and has a particular benefit for The Podcast Index.

The problem

If I listen to a podcast on Podfriend on my Windows laptop, then this is the useragent that will present itself to my podcast host:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

Here's the useragent from a listen to a podcast within a Google Search result on that same laptop:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

...here I am listening to a podcast using Podchaser's inbuilt player:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

...here I am listening to a podcast on a clever embedded Twitter card:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

...and here is a podcast on the Apple Podcasts website:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36

The observant among you will have spotted that these are all exactly the same. A podcast host is unable to differentiate between any of these plays, and a podcast host is left without any idea where their marketing has worked and which podcast service is driving the traffic to them.

The result? Podtrac reports that the second-most popular podcast player is... Google Chrome. Which it is. But it isn't.

Another problem

If you hit 'play' in Pocket Casts on an iPhone but you haven't downloaded the audio first, the app will try to 'progressively download' the audio - or 'stream' it in commonly-used language. It does this using the AppleCoreMusic library, and to a podcast host, it looks like this:

AppleCoreMedia/1.0.0.18A393 (iPhone; U; CPU OS 14_0_1 like Mac OS X; en_us)

Hit 'play' in Apple Podcasts on an iPhone without downloading the file first? It looks like

AppleCoreMedia/1.0.0.18A393 (iPhone; U; CPU OS 14_0_1 like Mac OS X; en_us)

Hit 'play' in Airr, a new iPhone podcast app? It looks like

AppleCoreMedia/1.0.0.18A393 (iPhone; U; CPU OS 14_0_1 like Mac OS X; en_us)

And hit 'play' in Castro, and it looks like

AppleCoreMedia/1.0.0.18A393 (iPhone; U; CPU OS 14_0_1 like Mac OS X; en_us)

That's because this is the useragent of the library, and it can't be changed. Similar things happen on the Apple Watch.

The result? Podcast hosts either attribute all of this to Apple Podcasts (because it says the word 'Apple' in it), or they don't report it at all, or they break it out to 'AppleCoreMedia' which doesn't help anyone.

Solutions

One solution is to use the referer HTTP header. But that doesn't always help with web-based plays (referers are often set to be blank for privacy; or report an email client or a social media service); and it doesn't help at all with the AppleCoreMedia problem.

Another solution is to give every podcast directory a different RSS feed which is bespoke to that platform. Some podcast hosts do this: but it isn't great to have lots of URLs floating around which have the same content with a different address (and is against best practice). It's also hard if you supply https://example.com/rss/for-apple into Apple Podcasts, and then Overcast goes and grabs that for its own use.

A better solution is to use the RSS useragents repo - to spot the RSS calls being made, and then to add a query string to the audio. So, if your RSS feed is called by a useragent you recognise like Overcast/1.0, you can then set the audio to be https://example.com/audio/mygreatpodcast.mp3?parsedby=overcast and you can then be reasonably sure that any use of this audio is from Overcast - even in the web player it produces.

Here is an example of the different useragents being used on a day's worth of data. The RSS useragent approach works really well.

BUT Podcast Index throws a spanner in the works here. The Podcast Index RSS useragent starts Podcastindex.org/ but that then means that every piece of audio that uses Podcast Index's API is tagged with parsedby=podcastindex. If twenty apps use Podcast Index's APIs, then those apps are invisible to podcasters, and we don't know which is performing best.

Proposal

Podfriend is a web app. It's unable to modify the browser useragent, or to add any bespoke headers.

If it wishes to signal to a podcaster that "this play is coming from Podfriend", it could send a unique indication in its request for the audio.

My proposal is that this specific query string name is uniformly called _from, and the value should be an unambiguous name for the service or app - normally a domain or pseudo-domain (which we may, optionally, wish to list in the OPAWG rss useragents list. The use of the underscore in _from is unlikely to clash with anything.

So, as an example: https://example.com/audio/mygreatpodcast.mp3?_from=podfriend.com

Where a querystring is already there, the query string should be appended with a &, like: https://example.com/audio/mygreatpodcast.mp3?version=45&_from=podfriend.com

Adding this audio query string in a uniform fashion would enable any host or analytics service to spot this query string for analytics; and help developers get the credit they deserve. It would work whether the podcast is downloaded or streamed, and be unlikely to be stripped or removed. (Only iVoox, in my tests, seems to remove all query strings from audio URLs).

Most importantly, it would be the end of "the second most popular podcast platform is Google Chrome" - which is nonsense.

Thoughts?

MartinMouritzen commented 3 years ago

I support "_from=" 100%. It's a simple and elegant solution that's easy to implement on both sides.

tomrossi7 commented 3 years ago

I like the idea of tackling problems like this, but I'm not sure where they are best addressed. This is definitely outside the scope of what Podcastindex is currently undertaking.

jamescridland commented 3 years ago

@tomrossi7 I entirely agree, it's outside scope (though affects Podcast Index more so than others).

Any clue as to where might I put this? If only I knew someone who wrote a daily newsletter or something.

MartinMouritzen commented 3 years ago

@jamescridland I do think it could be something handled by OPAWG. At least right now that's my best bet.

Then we need to make sure to link a lot to the slack from podcastindex.social to get more eyes on it?

daveajones commented 3 years ago

@jamescridland I posted about this on mastodon with an idea that we could maybe help with.

benjaminbellamy commented 3 years ago

I vote for ?_from= 👍

benjaminbellamy commented 3 years ago

But what will happens if the audio file URL already has a ?_from= in the RSS feed ? (Castopod already adds it based on opawg/podcast-rss-useragents)

MartinMouritzen commented 3 years ago

I think I would personally go for something like ?_appName=, which I feel is less ambigious, if I encounter "?_from" I'm thinking it could be all sorts of things, but "appName" I fully understand instantly. But if everyone thinks ?_from is better, then I am still perfectly fine with that.

Also, the reason why I don't think "ua" is all that good is that we do not want to recreate the UserAgent string - because that one will be sent with the request anyway, with platform information etc.. We simply want to pass on a bit of extra information.

benjaminbellamy commented 3 years ago

I named it ?s= in Castopod (s as in service) but ?_from= or ?_appName=, any name will work for me, as long as it doesn't take too long to pick one. 😉

jamescridland commented 3 years ago

@benjaminbellamy Good question. I think the point of trying to standardise this is to highlight that apps and players can overwrite any existing _from value.

For Podfriend, they will get an RSS feed from me with ?_from=podcastindex --because the feed has been parsed by Podcast Index. I would like to see _from=podfriend though, since Podfriend is handling the final play.

One thought is that we could ask for these to be added - _from=podfriend,podcastindex but I'm not sure how useful that is in reality.

Given the removal of slugs elsewhere in Podcast Index, we should be using domains or pseudo domains here, I suppose.

jamescridland commented 3 years ago

(This will be in Podnews on Monday. It would be good to drive a bit of traffic here, and let's see what the reaction is).

daveajones commented 3 years ago

Good call @jamescridland .

jamescridland commented 3 years ago

I've moved this to https://github.com/opawg/podcast-rss-useragents/issues/12

Please come and continue the discussion over there! :)