Podcastindex-org / database

19 stars 6 forks source link

Updates of RSS feed URLs are not handled #40

Open perelin opened 5 months ago

perelin commented 5 months ago

Hi all,

I discovered shows that have an old rss feed url. For example:

id=311230 ("ARD Radio Tatort") currently has url=http://web.ard.de/radiotatort/rss/podcast.xml The URL produces a timeout though.

Checking with the Apple index it seems they changed the feed url. curl --location 'https://itunes.apple.com/lookup?id=310864997' Its now url=https://feeds.br.de/ard-radio-tatort/feed.xml

Looking at the dates of last updates this happened in Febr 2022.

A side effect is that the image URL also changed and is not correct anymore in the Podcastindex DB.

Did the update mechanism miss this? Or are changes like this just not recorded at all?

perelin commented 5 months ago

Just stumbled over more stale data. Prominently the "Joe Rogan Experience" entry (id:1615079) has an old/wrong itunes_id. In the current podcastindex db dump its 1578037433 but looking at the actual entry from apple its 360084272.

https://itunes.apple.com/lookup?id=1578037433 https://itunes.apple.com/lookup?id=360084272

So far from my exploration of the data it seems like ca 10% of the entries have an old/wrong itunes id. If you are interested I can send you the list of the ones I have identified so far (ca 170k).

daveajones commented 5 months ago

Always interested in data accuracy help. If you want to do a PR to this repo with your list, that might be the easiest way.

daveajones commented 5 months ago

Did the update mechanism miss this? Or are changes like this just not recorded at all?

Updating feed urls is more complex than it may appear. The main barrier is if there is an itunes ID where Apple still shows the old feed url. In that case we don't update ours since many apps use us as a fallback to the apple lookup api, so we have to stay in agreement with that. The other issue is when redirects are not properly set or some other element such as doesn't agree with the redirect. It's a mess.