Podcastindex-org / web-ui

The public home page of podcastindex.org
MIT License
56 stars 123 forks source link

Cleanup of duplicate or broken feeds #307

Open Marzal opened 1 year ago

Marzal commented 1 year ago

I think these are too many for PI.social

stevencrader commented 1 year ago

Thanks. I resolved all the issues.

Marzal commented 1 year ago

Thanks! Should I close this issue or do you prefer that I use it for the next reports?

As:

stevencrader commented 1 year ago

I removed those duplicates also. I'm fine if you keep this open.

Marzal commented 1 year ago

Ok, so here is the next batch

stevencrader commented 1 year ago

I removed the dupes as requested.

I do have a question for @daveajones about how ivoxx feeds are handled. Do you have a "best practices" for these feeds? Ivoxx seems to provide different URLS in different places. They also seem to ignore part of the URL. For example, the following feeds all return 200 and the same content.

daveajones commented 1 year ago

Thanks guys.

@stevencrader Let me look through those and respond.

cisene commented 1 year ago

I've rewritten URL like the customized variant at the bottom, like @stevencrader .. https://www.ivoox.com/_fg_f1922056.xml .. as that is how it was "discovered". Ivoox also dabbles around with different subdomains, www, mx or any TLDcc of spanish speaking countries, I've tried to normalize these before submitting to the Index.

Pattern: https://www.ivoox.com/_fg_f{identity}_filtro_1.xml

Also: When verified and followed, final link submitted to PI

ThomasUmstattd commented 11 months ago

I found a cause of a lot of duplicate feeds in the index. PowerPress has a feature called "Enhance All Feeds" that turns every website RSS feed into a podcast feed. So example.com/feed is a podcast feed, as is example.com/feed/podcast. Unless the site is also a blog, these will have the same episodes.

Example: https://podcastindex.org/podcast/2123998 and https://podcastindex.org/podcast/4430.

PowerPress runs on 40k sites, and since this is the recommended setting, this causes tens of thousands of duplicate entries. Screenshot 2023-11-13 at 5 56 20 PM

In 99% of cases, "example.com/feed/podcast" is the primary/original feed.

daveajones commented 11 months ago

Will look into this.

Marzal commented 9 months ago

Hi, new broken feed found and some dups

stevencrader commented 9 months ago

I can't replace a Feed URL directly. Instead, I added the desired Feed and marked the others as duplicates. See https://podcastindex.org/podcast/6761187

Same as above. See https://podcastindex.org/podcast/6761188

Done

daveajones commented 9 months ago

Instead, I added the desired Feed and marked the others as duplicates.

that’s what I do most of the time too.

Marzal commented 9 months ago

Hi!

Question:

If the usual fix is to delete both entries/items in PI DB and create a new one with the better feed, would the AP actor be lost in case some is following @6518287@ap.podcastindex.org on the Fediverse?

stevencrader commented 9 months ago

That is an interesting question. I'm not sure how @daveajones is handling duplicates and the AP bridge.

stevencrader commented 8 months ago

I merged 6523766 in to 6518287.

The P20 Feed was also added as https://podcastindex.org/podcast/6809087

I've messaged @daveajones about the AP issue but not sure how it will be treated. See https://podcastindex.social/@steven/111910978193046513

Marzal commented 6 months ago

Hi new batch of dups:

Not sure if the better approach is to add "source" (from official website) to PI or leave PI 252695 as the good one.

stevencrader commented 6 months ago

Thanks. I cleaned up the duplicates. I tried to fix the feed URL of 252695 but am unable to because there is a dead id (4212695) that uses that url. I want to keep 252695 because it has the iTunes association. The iTunes DB also uses the feed URL I'm trying to set.

https://www.spreaker.com/show/3681678/episodes/feed

Do you have any ideas @daveajones ?

Marzal commented 4 months ago

Greetings , fellow podcast lovers, new duplicates:

Good one (with Apple ID): https://podcastindex.org/podcast/171865 - But need refresh for the artwork?

PD: The new https://episodes.fm icon , it's a fastest way to find the one with Apple ID, before I was using podnews directory.

daveajones commented 4 months ago

Greetings , fellow podcast lovers, new duplicates:

Good one (with Apple ID): https://podcastindex.org/podcast/171865 - But need refresh for the artwork?

PD: The new https://episodes.fm icon , it's a fastest way to find the one with Apple ID, before I was using podnews directory.

Fixed!

Marzal commented 4 months ago

New founds:

stevencrader commented 4 months ago

New founds:

Fixed

Marzal commented 4 months ago

The one with the itunes ID : https://podcastindex.org/podcast/161452 (but no image and not all episodes en PI, feed OK, enclosure same as feedpress) - feedburner (seems like a mirror of feedpress) ¿refresh/reset needed?

So 1453442 or 161452 if Apple ID is preferred for ¿Overcast fallback?

stevencrader commented 4 months ago

Thanks. I kept 161452

Marzal commented 4 months ago
stevencrader commented 4 months ago

Done. Thanks!

rlarzac commented 4 months ago

Hello,

Could you please delete: https://podcastindex.org/podcast/6923699 and https://podcastindex.org/podcast/6923292 as I have replaced them by https://podcastindex.org/podcast/6924587 and https://podcastindex.org/podcast/6924593 ?

Thank you :)

stevencrader commented 4 months ago

Hello,

Could you please delete: https://podcastindex.org/podcast/6923699 and https://podcastindex.org/podcast/6923292 as I have replaced them by https://podcastindex.org/podcast/6924587 and https://podcastindex.org/podcast/6924593 ?

Thank you :)

Done

Marzal commented 3 months ago

Hi, good summer everyone. New batch:

9 decibelios:

ivoox OK : https://podcastindex.org/podcast/2112302 :ok:

Eyes On Success:

The one with Apple ID: https://podcastindex.org/podcast/314749 :ok:

Audio momentos:

The one with Apple ID: https://podcastindex.org/podcast/955379 :ok:

Frecuencia Improvisada

The one with Apple ID: https://podcastindex.org/podcast/336764 :ok:

stevencrader commented 3 months ago

Thanks

@daveajones Not sure when they were added but should your auto dupe checker caught the difference in the feed URLs for these 2 feed IDs?

Marzal commented 2 months ago

Hi, what's the policy about dead inaccessible podcast? This one is dead both in web and the enclosures:

daveajones commented 2 months ago

Do you mean, when do they get removed after 404?

Marzal commented 2 months ago

A few doubts really: Are they auto removed? if ..

And if they are not in some case, would manual reporting (like I did) will get them removed even if one is on Apple Podcast, or the API compatibility for Marco (not sure if for others) is only for podcasts that actually work?

So I know what to report and a bit of curiosity too.

PD: Have you considered exposing to the PI web or API since when the crawlers have detected that the Feed is 404, so people or apps are warned that this podcast could not work or be removed?

Marzal commented 2 months ago

Hi, new batch to add to https://github.com/Podcastindex-org/web-ui/issues/307#issuecomment-2285921123

Adolescencia Positiva:

Sé feliz donde estés:

Universo Hijos:

Educación Respetuosa:

Inversapiens - Todos Somos Inversionistas:

stevencrader commented 2 months ago

Sé feliz donde estés

Since the 2 with the ivox feed are both in Apple so I left them.

Marzal commented 2 months ago

Mujeres con Historia:

stevencrader commented 1 month ago

Done

Marzal commented 1 month ago

Producción Propia

stevencrader commented 1 month ago

Done

Marzal commented 1 month ago

Same iVoox feed, different URL

Plagiando a Faulkner:

Rumore Chimico [ManchaPod]

Una Vaga Idea

stevencrader commented 1 month ago

Same iVoox feed, different URL

Puede Ser Una Charla Más Puede Ser Una Charla Más

These both have iTunes ids. What should we do @daveajones

DUP https://podcastindex.org/podcast/964595

This one also has an iTunes Id. Feed info is the same as https://podcastindex.org/podcast/1472747

Marzal commented 1 month ago

Evento 24H24L (Peertube)

stevencrader commented 1 month ago

Done

Marzal commented 3 weeks ago

Comunicando Podcast:

Marzal commented 4 days ago

Post Apocalipsis Nau

This is the feed they announce : https://www.ivoox.com/podcast-post-apocalipsis-nau_sq_f1634081_1.html which is the same feed that the 2 first options. https://www.ivoox.com/feed_fg_f1634081_filtro_1.xml

It would important not to delete or reset one of the first 2 because ivoox only show the last 20, and PI would lose the old episodes

EnREDando con MOA

Especiales de EuskaDigital

EuskaDigital – Sarean Zehar

EuskaDigital - EnRedAndo