Podcastindex-org / podcast-namespace

A wholistic rss namespace for podcasting
Creative Commons Zero v1.0 Universal
371 stars 111 forks source link

Proposal: <podcast:frequency> tag to suggest feed refresh frequency #154

Open keunes opened 3 years ago

keunes commented 3 years ago

In AntennaPod all active podcasts (feeds) are updated when feeds are refreshed. However, some podcasts only release on a (bi)weekly schedule. Making feed update frequency specific to each feed makes feed updates in the app faster and reduces the server load. Such was also requested by an AntennaPod user.

Differentiated refresh frequencies could be set by the user in the app, but this complicates the interface and might lead to a bad user experience (if user forgets this setting and complains an episode comes in late). Apps could also let their update frequencies depend on server settings, but this was implemented in AntennaPod but reverted because it lead to issues. Also, relying on server settings might not work when content creators use shared hosting services.

Therefore I would like to propose the optional <podcast:frequency> channel tag, allowing podcasters to indicate how often they (expect to) release new episodes. Apps/clients can use this to determine when and how often they should refresh the feed in question.

If the release day is Thursday, apps could decide to check on Thursday and Friday. That way the app is sure to find a particular episode if it is released after the app's refresh, and avoid that the episode only reaches the user a week later. It should be possible to indicate multiple weekdays, as some podcasts release on multiple specified days (e.g. for the No Agenda it's every Thursday and Sunday).

I would say release time (e.g. noon or 21:00) should not be covered by this tag, as it'll be complicated/leads to confusion with time zones.

Proposed attributes:

Maybe there are any other 'frequency' standards, that would allow indicating e.g. 'every second Thursday of the month'? E.g. from OSM. But maybe these advanced schemes should get a specific attribute, so it doesn't interfere with simpler annotations.

Examples <podcast:frequency>daily</podcast:frequency> (new episodes expected every day) <podcast:frequency weekday="Monday">biweekly</podcast:frequency> (new episodes expected every other week on Monday) <podcast:frequency weekday="Tuesday,Thursday">weekly</podcast:frequency> (new episodes expected every week on Tuesday and Thursday)

I don't know how this should best be shaped on a technical level, so if the experts have a better idea than my 'example' that would be very welcome :) And maybe 'frequency' should be updated to 'releases' because the release day is not really covered by the 'frequency' title.

daveajones commented 3 years ago

The rawvoice namespace has something like this I believe. It would be nice to have this, but do it on a level that is more specific. The “daily”, “weekly” thing is good. But, I like the last ones your mentioned. Something equivalent to “week-days between 6:00am and 7:00am” would be killer.

Maybe an integer mapping where weekly time slots are mapped to a whole number and then used as a grid for selection.

brianoflondon commented 3 years ago

It's hard to avoid time zone problems because even if you specify day of release, even No Agenda drops for me pretty close to midnight and sometimes over the line on Friday or Monday mornings.

Would the well known format of Unix Cron schedules be something to use for this? It has pretty much all the right options and it's as old as the Universe or near enough. It's a bit geeky but perhaps there is a lot of code to already figure this.

3 Hour refresh window starting at 11:00 GMT every weekday:

<podcast:frequency format="cron" tz="GMT" window="3">0 11 * * 1-5</podcast:refresh>

daveajones commented 3 years ago

Wow. I never thought of CRON.

brianoflondon commented 3 years ago

Coincidentally I've just spent a day getting my own python script to run from a cron job on my iMac to record and tweet out the level of the Sea of Galilee every day :-)

keunes commented 3 years ago

It's hard to avoid time zone problems because even if you specify day of release, even No Agenda drops for me pretty close to midnight and sometimes over the line on Friday or Monday mornings.

Sure. There's a responsibility for the clients also. One of the reasons why I wrote the following:

If the release day is Thursday, apps could decide to check on Thursday and Friday. That way the app is sure to find a particular episode if it is released after the app's refresh, and avoid that the episode only reaches the user a week later.

Would the well known format of Unix Cron schedules be something to use for this?

Hah, interesting idea :) As long as it's implemented in a way that's convenient for podcasters. So it would be great if it would accept both methods.

cio-blubrry commented 3 years ago

Sorry I posted a similar subject in a different thread. I did not see there was a frequency discussion already. https://github.com/Podcastindex-org/podcast-namespace/issues/157 My use is different however, based on what is displayed to the user, not for programmatically pulling feeds.

I think it is important to state the goal for this version of frequency tag. It appears in this case the goal is for the app to know when to pull the podcast feed? I just want to make that clear. There are a lot of tags proposed wi9thout the goal in mind which may be why some are not moving forward as quickly as they could be.

First thought:s..

Perhaps both goals can be combined somehow?

<podcast:frequency format="cron"value="0 11 * * 1-5" tz="GMT" window="3">Readable description to display to users</podcast:frequency>

Also I see the tag in one reply. That seems to be more specific to the goal of this thread.

As an architect of a system that pulls 1.5 million feeds today in a 4 hour window, the ideal solution is not a refresh or frequency for feed pulling. It can give you an idea when to pull a feed, but with caching and possible delays due to other factors that happen naturally in life you may inadvertently miss a release that is 30 seconds later than when you checked based on the RSS tag. I would suggest exploring the use of WebSUB as a solution to this problem rather than an RSS tag which has wider adoption than this namespace currently.

daveajones commented 3 years ago

I would suggest exploring the use of WebSUB as a solution to this problem rather than an RSS tag which has wider adoption than this namespace currently.

I agree with this. We use WebSub extensively on the Podcast Index also, and it's the proper solution. We do still run into caching issues sometimes with websub pings, but it's mostly fine.

If we aim for simple displayability with this tag, but still keep it easily parseable (like comma separated days) then it could still serve as "hints" for those apps that want to target their refresh cycles if they can't support WebSub.

keunes commented 3 years ago

I think it is important to state the goal for this version of frequency tag. It appears in this case the goal is for the app to know when to pull the podcast feed?

Indeed:

Making feed update frequency specific to each feed makes feed updates in the app faster and reduces the server load. Such was also requested by an AntennaPod user.

Perhaps both goals can be combined somehow? <podcast:frequency format="cron"value="0 11 * * 1-5" tz="GMT" window="3">Readable description to display to users</podcast:frequency>

I would suggest exploring the use of WebSUB as a solution to this problem rather than an RSS tag which has wider adoption than this namespace currently.

I don't know about the possibility of using WebSUB in AntennaPod - @ByteHamster would be best suited to judge that. But the the multi-purpose concept of a tag seems feasible & logic. AntennaPod (& other podcatchers) might also benefit from the human-readable form, e.g. to display it to users when they're subscribing to a podcast.

theDanielJLewis commented 3 years ago

I think both WebSub and a frequency tag have merit and separate use cases.

The frequency tag could be used to indicate or find podcasts of a certain frequency so the app doesn't have to calculate it.

For example, Dan Carlin's Hardcore History might have an "infrequent" label, which makes the 6-hour episodes seem tolerable. But if a podcast has daily 6-hour episodes (with the "daily" frequency), then it can help me decide for or against it.

keunes commented 3 years ago

I just learnt that, as AntennaPod is a client-only podcatcher, we probably cannot use WebSUB. On Wikipedia I read that "the subscriber needs to run a web accessible server so that hubs can directly notify it when any of its subscribed topics have updated" - which wouldn't be possible for AntennaPod. So a machine-readable tag would still be valuable.

jamescridland commented 3 years ago

Yes, that's fair enough. You could run a notification service to tell a client to refresh a feed, but that seems quite hard work.

cio-blubrry commented 3 years ago

I like the AntennaPod app! Today with mobile app design it is ideal to have resources working in the background in the cloud (server infrastructure) and leverage push notifications rather than have scheduled events that require the application to work in the background - This is for battery life reasons. I know this is easier said than done, it requires server infrastructure and the resources to maintain it.

We have discussed opening up our MyCast subscription service API to the public, we use it for push notifications to our own apps, but it may still be a challenge for other apps to use it. I will ask how easy is it for another app to also receive the push notifications we create but I am almost certain they are application specific. I do know we use Firebase and then that routers either or both "Notification message" and "data message" types of notifications to both Android as well as iOS devices and we have special code to deal with push endpoints that bounce (similar to bouncing email address). the "data" message is meant for background downloads where-as the "notification" is what appears in the App's notification area. Our API we allow apps to receive screen Notifications as well as background updating (data notifications).

keunes commented 3 years ago

Thanks for chiming in @cio-blubrry and thinking along. You made me think: While some public API takes out the 'hard work' argument of using a notification service to use WebSUB, I reckon it introduces a new issue: privacy.

AntennaPod fully respects users' privacy and does not send any data to anyone (except http requests to download feeds and API requests for search). If there's a central service that informs a user (app) of new episodes, this service must know 1) my device and 2) which podcasts I'm following.

Then we have two paths to reconcile this with AntennaPod's privacy standards: A) we must securely self-host such service (kinda ruled out already), or B) it must be introduced as an optional service that is off by default and that users can switch on. Because the benefits are so limited (battery) I don't think it would warrant the introduction of a new setting (as another of our goals is keeping it simple).

cio-blubrry commented 3 years ago

@keunes privacy alone is good enough reason to leave it function as-is. Also regarding privacy, not only do you identify the device and what they are listening to, but you also identify the user with their email address to provide them an account. Ultimate privacy, as that is the goal, would be greatly compromised. Also food for thought, any app that has push notifications has the exact information about you. Most of us do not think about it but nearly every app on our phones has information that we most likely are not aware of or think about.

With that said, I still do not see an advantage of a frequency tag, as it may cause the issue previously described that you pull the feed when your told to and there's no data, but a few hours later it is there, and you're adhering to their once a week at 3am frequency never going go pull the feed again for another week, that is just as problematic.

Just tossing this into the mix, lets pick a standard frequency and then behavior with a frequency. A daily standard...

Podcaster just needs to define what day of the week and what time they want you to pull...

<podcast:feedPulling pullOn="M:3:30,TH:17:15,SA:9:01" />

You let the podcaster who sets up the update frequency decide how often they want you to pull the feed (this is similar to how Google search crawl rate can be controlled to your discretion). Another good place for cron to be used. With that thinking, the tag may be something like...

<podcast:crawlRate cron="* * * * *" dailyLimit="4" />

This has a different benefit., helps throttle feed pulling as well.

jamescridland commented 3 years ago

Regarding privacy - there's no right or wrong answer here.

If your app polls RSS feeds directly then that's bad since you are giving podcast hosts regular and in many cases identifiable pings from a device, allowing a podcast host to, in some cases, work out if you're at home or at work.

If your app is given notifications from a central server, that requires the central server to know all shows you subscribe to. However, that is slightly better for the user's privacy, given that the app is not contacting the podcast hosts repeatedly, and the only communication with hosts happens with audio downloads.

Additionally, many podcast hosts use individual domain names for their RSS feeds. This enables anyone who can see your internet traffic to also see - on a regular basis - which shows you're subscribed to, since domain names are never encrypted. A hub doing this work will avoid this privacy leak; since the communication between the hub and the podcast app can be entirely encrypted.

Podcast apps that use a central hub are going to use less data (for users and hosts), less battery, and with WebSub, get new shows almost immediately.

Given that RSS means anyone can set up a podcast host with no privacy oversight, it's worthwhile considering the risks here for privacy.

brianoflondon commented 3 years ago

If your app polls RSS feeds directly then that's bad since you are giving podcast hosts regular and in many cases identifiable pings from a device, allowing a podcast host to, in some cases, work out if you're at home or at work.

It just occurs to me that the abstraction of running RSS feeds through Hive and fetching them from Hive's API means absolutely NOBODY can aggregate the data of who is polling RSS Feeds.

Right now running a test API server would be a central point that can do this, but I fully expect, if there is a demand, that the calls I'm writing would be built into Hive's API system and then those calls would be distributed around 15+ api servers run by different private entities.

Even for a show hosted on one of the big hosts, if the feed is mirrored and the address switched to the mirrored feed, IP data will not be collectable.

stuartjmoore commented 3 years ago

Similar proposal and thoughts: https://github.com/Podcastindex-org/podcast-namespace/issues/157#issuecomment-764698466

jamescridland commented 3 years ago

A few thoughts:

  1. As a podcaster and my own podcast host, I do not want anyone mirroring my RSS files without my consent. I'll block any attempts to do that.

Whether you agree with this or not, the fundamental thing here is to respect creators' wishes, and to ensure that where a creator expresses a view, to allow them to communicate that view in a programmatic way to enable podcast apps to do the right thing by the creator.

(On the other hand - I would very much welcome my audio to be cached somewhere, assuming that I can get some form of data).

  1. Given that WebSub fixes the problem of "is there a new show yet?" and is already an agreed standard, I'm not really very keen to encourage an alternate mechanism. However, clearly, WebSub only works with a central hub in some way. Perhaps a better plan would be to think how a podcast can indicate "there's a new show!" in a way that a podcast app can subscribe to in a privacy-aware way. Just advertising for rough times when you might have a new show doesn't seem to fix the problem in the right way - and there's nothing to stop an enterprising podcaster basically saying "check every two minutes just in case", which is worse than useless.

I'm sorry to be pouring cold water on this - but to me, WebSub is the "right" solution. It conserves bandwidth by only parsing the RSS file when a podcaster explicitly tells someone to.

agates commented 3 years ago

I wonder if the questions "approximately when should the feed be automatically refreshed next?" and "what kind of release schedule does this podcast have?" should be solved separately? The former can use relative time durations to figure out task scheduling while the latter is ultimately a calendar problem, which is fundamentally more difficult (especially since there is more than one calendar system in the world).

That said, it's not as nerdy as cron, but ISO 8601 can do repeating intervals and automatically has timezone support. Plus it doesn't suffer from multiple different possible cron implementations (only oddities like Microsoft putting colons in their date strings where they don't belong). If absolutely necessary, multiple intervals could be supplied to represent multiple abstract patterns.

Also, iCalendar already has a recurrence standard (RFC 5545) to follow.

jmikedupont2 commented 3 years ago

I would like to be able to calculate the probability of a podcast publishing at a given day, hour, minute etc from its data. For this purpose I would like a column oriented data dump like parquet so I can pull out the timestamp of the publishing from the feed for each podcast, then we can do some analysis on the time series.

jmikedupont2 commented 3 years ago

https://gitlab.com/-/ide/project/jmikedupont2/podcaststats/tree/master/-/stats.py/ here is the first version of a python script that calculates the all time stats as well as 30, 60 90 etc day states. Next we will want day of week.

PofMagicfingers commented 3 years ago

Just adding my 2 cents here, but if the goal is to lower server updates, isn't that kinda already available in rss 2.0 standard?

https://validator.w3.org/feed/docs/rss2.html

There is ttl, skipHours, skipDays that serves that purpose originally. I never seen a rss client use it, but they exists.

However I'm not against a podcaster defined frequency tag. with my own podcast, clients computing frequency display something like Wednesday every other week, where in fact we publish like 2 weekly episode with pauses between 2 to 4 month every 3 episodes 😅 (it's difficult to find times to record 😁)

With a frequency tag allowing to set it like infrequent that would be less "blurry" for listeners

jmikedupont2 commented 3 years ago

https://gitlab.com/jmikedupont2/podcaststats/-/blob/master/example/thedaily.txt here you go, so the daily mostly drops on Thursday , never on Saturday and mostly between 9 and 10am UTC so you can use this data to save a lot of CPU and bandwith on polling.

keunes commented 3 years ago

I was thinking about this the past week, and thought 'why can't PI calculate this info, given that it's keeping track of episodes already - output would probably much more reliable than podcaster-provided info'. And there you go, @jmikedupont2 wrote a script already :D I love this community.

Would be great if this data could be included in PI, and served via API.

jmikedupont2 commented 3 years ago

I'm able to get the last refresh date off of archive.org dumps and now all I need is 30 dumps to be kept there so that I can calculate how often they get updated so can we maybe date stamp these dumps on archive.org so that we can pull X number of days

On Sat, Mar 20, 2021, 11:43 AM Keunes @.***> wrote:

I was thinking about this the past week, and thought 'why can't PI calculate this info, given that it's keeping track of episodes already - output would probably much more reliable than podcaster-provided info'. And there you go, @jmikedupont2 https://github.com/jmikedupont2 wrote a script already :D I love this community.

Would be great if this data could be included in PI, and served via API.

  • The issue of publisher trolling-misuse as @jamescridland https://github.com/jamescridland mentioned is tackled
  • The 'default frequency' could be daily, with a 'frequency' returned by PI only if a clear cadance/schedule could be identified for a podcast (in line with @cio-blubrry https://github.com/cio-blubrry's comments)
  • This way, the expected frequency could also be displayed in clients' 'Discover' or 'Add podcast' screens, as some apps do already (with WebSUB the client would still need to calculate this to display anything useful).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Podcastindex-org/podcast-namespace/issues/154#issuecomment-803383425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5KQ2NEUIBMXZYWTETXEHTTES7BBANCNFSM4VNDOUZQ .

boyska commented 1 year ago

Hello folks! First of all, thanks to all the people running this great project. I love how the podcast ecosystem is improving.

I'm a podcast host: I run the website of a radio. We don't like commercial platforms, and we want to promote open alternatives as much as possible. The biggest technical challenge I can see right now is exactly about frequency: listeners get updates too late.

Now, my problem is: how do I get my users to receive my news timely? Some podcast applications assume very low frequency by default. Sure, the user can change it, but real users don't. So I wonder: can this proposal help me?

The way I see it, older tags like <ttl> (thanks for mentioning them, @PofMagicfingers ) already do what I need. There is even <sy:updatePeriod>. Their usage is not great, but not negligible either, they have been there since forever, so I'm sorry to see that client support is quite low.

Can the proposal be better than that?

I like the idea of recurring rules, and I love the idea of ISO8601 intervals. But is there any practical case in which they can give good advantage compared to updatePeriod + skipHours + skipDays?

I will try to make some user story to reason about it.


Story 1: quite regular weekly podcast

A podcast releases a new podcast every wednesday around 7.30 UTC. Sometimes they are a bit late, but no more than 1 hour. So they can set updatePeriod=hourly, updateFrequency=2, skipHours=every hour except 7 and 8, skipDays=every day except wednesday

Story 2: news website with frequent news

Think of a radio releasing one podcast every 2 hour, except maybe in the night. again, updatePeriod=hourly + skipHours, and that's all

Story 3: very irregular website, but news are very urgent

You might have some sort of a "alert" podcast: when something is out, you should know it in less than 2 hours.

The only thing you can do is to poll very frequently and never skip. No elaborate rule can do any better than this.

Story 4: like Story 1, but it's monthly

Using the same tags as in Story 1 will lead to some "wasted" check. But, do we even care about something so rare?


So, my conclusion is that, if the goal is to express when a client should poll, we don't strictly need a new tag, because using the old tags is good enough. Is my analysis wrong? Am I missing some story?

If my analysis is not flawed, then I wonder: do we just want to implement the same functionality in a more elegant (and maybe general) way? Or, are we trying to express something different?

agates commented 1 year ago

Honestly, podping has made the idea of "when to update" mostly irrelevant (app adoption aside). That provides instant updates/notifications when changes are made to a feed. It solves the problem by making it a non-issue, using a push event architecture instead of a pull where the app guesses when to update.

I think the idea of a schedule with ICS + CalDAV would otherwise be the most straightforward approach for letting users know about future events, since it's already so widely in use. But I think it's a separate use case.

keunes commented 1 year ago

podping has made the idea of "when to update" mostly irrelevant (app adoption aside). That provides instant updates/notifications when changes are made to a feed

@agates that may be true for some or even most cases. But as we have established above WebSub won't be possible for client-only apps like AntennaPod, and (unless I misunderstand how it works) the same would be true for PodPing.

I'm a bit sad to see we're going back to this "it's irrelevant" argument again rather than discussing the options and questions posited by @boyska.

francosolerio commented 1 year ago

I agree wholeheartedly with @keunes: many of us app developers have serverless stacks, and some of those who rely on heavy server setups are re-considering their position. We hear a lot of talk about the need of decentralization, but at the current state of technology any feature that involves Podping or WebSub means penalizing server-less apps and pushing toward centralized systems.

brianoflondon commented 1 year ago

I agree wholeheartedly with @keunes: many of us app developers have serverless stacks, and some of those who rely on heavy server setups are re-considering their position. We hear a lot of talk about the need of decentralization, but at the current state of technology any feature that involves Podping or WebSub means penalizing server-less apps and pushing toward centralized systems.

I'm asking this because I genuinely don't understand how serverless apps work in terms of knowing when to update and Castamatic is my primary Podcasting 2.0 app these days.

On what schedule and by what means does the app on my phone update new shows?

keunes commented 1 year ago

On what schedule and by what means does the app on my phone update new shows?

If you were to use AntennaPod, the refresh interval is 'every 12 hours' by default. Users can also change this interval, or select a time in the day instead (for example to have new episodes ready for breakfast).

So in short, true decentralised apps don't "know" when to refresh - they just do something. Hence this proposal (to improve from something random to at least an educated guess).

francosolerio commented 1 year ago

On what schedule and by what means does the app on my phone update new shows

There is no fixed time for Castamatic: it updates more often the shows that are listened to the most by the user, and tries to predict the next episode publishing time, so to make more checks in that period of time.

agates commented 1 year ago

podping has made the idea of "when to update" mostly irrelevant (app adoption aside). That provides instant updates/notifications when changes are made to a feed

@agates that may be true for some or even most cases. But as we have established above WebSub won't be possible for client-only apps like AntennaPod, and (unless I misunderstand how it works) the same would be true for PodPing.

I'm a bit sad to see we're going back to this "it's irrelevant" argument again rather than discussing the options and questions posited by @boyska.

Then you misunderstand how Podping works. I have explicitly designed podping to work with "serverless" stacks -- right now any client can query a Hive API for updates. The issue right now is it's not very efficient, so we'll have to work on a standardized podping-specific endpoint that allows for mobile-friendly notifications/updates.

Does that exist yet? No, but I still believe it's the correct way to solve the issue. The way I envision it, any user would be able to host their own "Podping API" to plug into their podcast app (and of course developers can set their own or use defaults hosted by podcast index or something).

I'm on your side. AntennaPod is my daily driver and I will make it so Podping is friendly for it and others. I'm just one guy though.

EDIT: All that said, I think we still need a schedule tag which would help solve this anyway.

boyska commented 1 year ago

right now any client can query a Hive API for updates

I think I might be among the people not understanding podping: I studied it a bit, but maybe not enough, so I'm sorry if I'm just wrong. Is "querying the Hive API" any different than "polling" ? don't we still need to wonder "ok, but when should I query the Hive API?"

any user would be able to host their own "Podping API"

isn't this the opposite of "serverless" ?

redimongo commented 1 year ago

WebSub

Would be interested in knowing more about how you have integrated WebSub.

redimongo commented 1 year ago

Technically we don't need to set a time to fetch a podcast or re-poll. here how to check for a podcast update schedule.

Check if there is a patten in the items pubDate if there is a patten then schedule the feed to be updated based on that patten.

The issues with the proposed above is sometimes we publish a special episode between the frequency what happens then? Does the podcast player still poll the RSS feed or does it not poll/request it til the next date?