Podcastindex-org / database

16 stars 6 forks source link

Is the itunesId column kept up to date? #35

Open ryan-lp opened 6 months ago

ryan-lp commented 6 months ago

One approach out there to abide by app/play store guidelines is to only list podcasts that are listed in Apple's directory and rely on the existing review process Apple already uses for podcasts it accepts into its directory, i.e. podcasts that have an iTunes ID. The Podcast Index DB provides an itunesId column which might be useful for this purpose.

However, if a podcast that was previously listed in Apple's directory is later found to be in violation of Apple's content guidelines and its iTunes ID is revoked, the Podcast Index DB itunesId column would be invalidated for this purpose unless it were also kept up to date.

This raises the question, is the itunesId column kept up to date?

daveajones commented 6 months ago

It is kept up to date through various methods, primarily Marco syncing with us. But also various cleanup scripts. It’s not perfect because Apple’s db isn’t perfect and things often disappear/reappear from AP or have multiple iTunes id’s pointing to the same feed, etc. it’s a mess.

ryan-lp commented 6 months ago

Thanks, would it be worthwhile making a page somewhere which goes through each column in the schema and explains what it is and any related facts, such as in this case, that it is kept up to date?

In this particular case, and unfortunately I can't remember which record it was now, but I thought I noticed some records where the podcast was no longer in Apple's index but the itunesId column was still set. I'll see if I can find it in my command history.

ryan-lp commented 6 months ago

Here it is:

sqlite> select id, itunesId, url from podcasts where id = 311875;
311875|355514361|https://www.tct.tv/podcasts/fih.php

The feed URL is a PHP script that may have worked in the past since it looks like valid PHP code for generating an RSS feed, however the server is now misconfigured and a GET request just returns the PHP script code itself. Your crawler also seems to use an extremely forgiving parser in that it still picks up all of the XML start/end tags within the PHP script, and ends up capturing the language element value as '. $lang.

However, when I query the iTunes API with the itunesId 355514361, I get an empty result which suggests it Apple might have removed it.

It is kept up to date through various methods, primarily Marco syncing with us.

If the syncing is initiated from Marco's end for each record that he has, then I guess that he just doesn't have a record for 355514361 while you do, and that syncing process leaves 355514361 eternally untouched?