Podcastindex-org / docs-api

Developer documentation for the podcastindex.org api.
https://podcastindex-org.github.io/docs-api/
MIT License
52 stars 29 forks source link

Recent feeds not returning all expected feeds #95

Closed joshuahoover closed 1 year ago

joshuahoover commented 1 year ago

I have a project where I'm calling https://api.podcastindex.org/api/1.0/recent/feeds?max=500&since= every 5 minutes to get new feeds that I then parse each feed URL and gather episode data that I'm indexing for free-form text search later. I pass in a timestamp that is 5 minutes in the past on each run.

I've let this run for several days and look for feeds that I subscribe to but can never find any of them. I log the feed URLs I receive on every API call and when I grep for feeds I know have been updated within the past 2 days, I can't find them.

Some example feed URLs that have updated in the past 2 days that my script has never logged after calling the recent feeds API endpoint every 5 minutes:

https://feeds.acast.com/public/shows/607879ddfbe1eb33d6525119 https://feeds.feedburner.com/TvJunkPodcast https://feeds.transistor.fm/rework https://feeds.megaphone.fm/LSHML4761942757

It's possible I have an error in my code, but I seem to be missing quite a few feeds I know have updated recently.

daveajones commented 1 year ago

max of 500 is low. There can easily be more than 100 feeds per minute that update. I think you'd have better luck with the tracking url: https://tracking.podcastindex.org/current. That tracks the "/recent/data" endpoint and should give you complete information without having to fiddle.

joshuahoover commented 1 year ago

@daveajones Thanks! Is there any doc on that endpoint by chance? Like, how often it updates?

joshuahoover commented 1 year ago

@daveajones Are recent feeds those that have updated episodes or are they new feeds added to the index?

The tracking endpoint you provided shows like 100 or less feeds in the count at any given time, while the episode count is almost always the max (1000). Thus my question :smile:

daveajones commented 1 year ago

It updates every 3 minutes. It's s combination of recent feeds and recent items in a single return. The idea is that, on some interval, you get the current JSON and then walk backwards through each successive previousTrackingUrl until you hit an ID you already have. Then you know you've caught up.

joshuahoover commented 1 year ago

Perfect. I'm using it instead of the API call and all seems to be working great. Much appreciated!