AntennaPod / AntennaPod

A podcast manager for Android
https://www.antennapod.org
GNU General Public License v3.0
6.49k stars 1.42k forks source link

gpodder full sync duplicates podcast subscriptions #2214

Open JeanFred opened 7 years ago

JeanFred commented 7 years ago

Expected behaviour: gpodder full sync syncs the local subscriptions with the ones on gpodder.net

Current behaviour: full sync results in duplicated subscriptions for some but not all podcasts (in my case, 29 out my ~100 podcasts). Another full sync results in duplicated subscriptions (all 29 but one) having three entries.


App version: 1.6.2.2 (from F-Store)

Android version: 5.1.1

Device model: Samsung Galaxy S5 mini (SM-G800F)

mfietz commented 7 years ago

Is it possible those subscriptions have different URLs? e.g. the device subscription was for MP3, but gpodder for MP4/OGG? Or the subscription moved and gpodder still had the old url?

We use the the url to detect podcasts, so if that is not the same, we add it as a new podcast.

JeanFred commented 7 years ago

@mfietz Interesting. I checked four of these triplicated subscriptions and looks like the URL was indeed moved (eg http://feeds.thisamericanlife.org/talpodcast became http://feed.thisamericanlife.org/talpodcast). I only ever subscribe using gpodder, so I guess they updated their records of these podcasts since I subscribed.

The third subscription is the same as the second one though − not sure where that would come from?

Using the URL to detect podcasts makes sense to me too, so I’m not sure what AntennaPod should do better in this situation ; but the result is quite unexpected. (Any advice on the best course of action? Clearing all local subscriptions and re-syncing? I’m a bit wary of losing episode state...)

dalb8 commented 7 years ago

I got this and the number of podcasts kept proliferating each time I synced ; I was only doing manual. Had to wipe data and disable gpodder

jeee commented 7 years ago

Maybe add an option to antennapod to just remove all podcasts before the new sync is a (temporary) solution? I now have some podcasts that are listed 6 times....

holocronweaver commented 7 years ago

I am currently experimenting with only syncing a single device to gpodder at a time in hopes it works around this bug. This way gpodder can only be used for backing up - better than nothing. A few days out and no duplicated items, will report any changes.

joshproehl commented 6 years ago

I created a new gpodder account, and synced podcasts from my tablet to it, no problem.

I installed AntennaPod on my phone, and set up gpodder, and then told gpodder to sync my phone with what was installed on the tablet.

Now both devices are getting duplicate podcasts. Some of which have different URLs, but many of which have identical URLs. When you delete one from AntennaPod of them both are removed.

One the podcast that I tested this on I re-subscribed to the podcast via the gpodder web UI, (assigned it to the tablet) and so far I only have one copy on the tablet.

However for some other podcasts I'm now up to three copies on the tablet. These do not appear to be related to #2267 as they have no special characters in the URL. (Several have a "?", but otherwise it's http://site.com/feed)

If there is anything I can try to help gather information about this issue please let me know! (I'd very much like to get this working in a stable fashion, multi-device sync is a feature I'm very interested in!)

jmichael2497 commented 6 years ago

adding my confirmation, the 301 redirects seem to be part of the problem. i found this same quirky behavior after all the night vale feeds updated around 06-22.

maybe have antennapod check feed urls for at least 301 redirect before doing the full sync request, so it will be ready to match up with gpodder.

tl;dr for fuzzily remembered details below: full sync with gpodder just seems like a problematic thing to use, as it can break multiple things, but kinda better than nothing sometimes, maybe add caution tape.

i had been getting sync errors rather often with some streaming shows as i travel around, maybe due to spotty internet connections. being bored and curious, i think i decided to try full sync, and then the duplication and more issues started.

i ended up with odd duplicates of all those shows with redirects, with no episodes marked complete in the new entries at first. i noticed the new entries had different feeds from original libsyn, so i just deleted and tried again, same duplication.

after i did a global feed refresh i noticed the original entries with episode history updated feeds from libsyn to new address and after removing the other duplicate feeds without history, a full sync no longer created duplicates.

also the full sync left a lot of previously streamed episodes marked as done but with progress bar rounding error showing not quite finished, apparently, separate issue, fixed with manual sqlite cleanup. so maybe put a warning to use full sync with caution, or examples of when it actually should be used.

nigelvdv commented 5 years ago

Is there any news on this?

So far I have been removing all my podcast subscriptions in Antennapod every few months before cleaning out my gpodder subscriptions manually on the gpodder website. It works, but is very cumbersome especially since Gpodder doesn't recognize all subscriptions ("Unknown podcast from...").

Kind regards,

Nigel

outdooracorn commented 5 years ago

I'll add that it seems to duplicate shows from https://site.com/rss to http://site.com/rss. AntennaPod should probably ignore the scheme (and the query?) from the URI when evaluating whether a show is a duplicate.

alexanderadam commented 4 years ago

Thank you for reoopening, @ByteHamster. :pray:

To deal with the problem, I currently have two ideas (which I both do not like):

  • When detecting a redirect, not only update the URL in the database but also delete the old URL on gpodder and add the new one. This deletes all state of that feed on Gpodder.

Personally I'm against this variant because it's not AntennaPods "fault" that gpodder has exactly the same bug. Also AntennaPod shouldn't be destructive for external services.

  • When synchronizing with Gpodder (step 1), do a network request to all servers and see if they have a redirect. This means that AntennaPod would basically do a full refresh (but without actually refreshing the items) every time it syncs with Gpodder.

When I did those exports all of the exported feeds had only the "new" URL, right? Is the new URL also persisted on one point in time or is AntennaPod during the export checking for it as well? Because it should be possible to "merge" feeds if the new URL is persisted earlier, right? Or am I thinking into a wrong direction here?

ByteHamster commented 4 years ago

gpodder has exactly the same bug.

Actually, https://github.com/gpodder/mygpo/issues/45 is a different thing. It is about feeds in the general, public feed database.

Is the new URL also persisted on one point in time or is AntennaPod during the export checking for it as well?

The url is persisted when refreshing the subscriptions. This is independent from the gpodder synchronization. Different service, different time.

Because it should be possible to "merge" feeds if the new URL is persisted earlier, right?

This means AntennaPod would create a new feed and then delete it later when refreshing. Users could see the duplicates until the refresh service is executed. Additionally, the refresh service, at this point, can not be 100% sure which one of the feeds contains the actual data.


By the way, this should only happen if you do a full sync. The normal (delta) sync should not create duplicates because it only downloads the newly added feeds.

alexanderadam commented 4 years ago

The url is persisted when refreshing the subscriptions. This is independent from the gpodder synchronization. Different service, different time.

So in that case

Because it should be possible to "merge" feeds if the new URL is persisted earlier, right?

This means AntennaPod would create a new feed and then delete it later when refreshing.

At which point in time?

Users could see the duplicates until the refresh service is executed.

I guess the scenarios are a bit complicated:

  1. AntennaPod has an outdated URL & gpodder sends just the new one
  2. AntennaPod has a new URL & gpodder sends just the old one
  3. AntennaPod has an outdated URL & gpodder sends both
  4. AntennaPod has a new URL & gpodder sends both
  5. AntennaPod has both (doesn't matter what gpodder sends here)

IMHO the only relevant date is the new URL, though. If all known Podcasts will be checked for Redirects before a sync and if AntennaPod checks new URLs (from gpodder or elsewhere) for redirects when adding "new" subscriptions, the gpodder bug is not too relevant anymore. This would leave us just with two cases left:

  1. Everything is fine because we never have an outdated URL for newly added or synced feeds anymore :tada:
  2. AntennaPod still has an old and a new URL persisted and has to merge them :disappointed:

Having to merge an old and a new feed seems like a final boss level to me though. Because episode information (what's seen and what's not) is probably exclusive in one feed.

I also believe that it would make sense persist redirect rules on full syncs and make a lookup on delta syncs.

This bug sounds very complicated to me. I'm very sorry that I brought it up again. :disappointed:

ByteHamster commented 4 years ago

At which point in time?

When refreshing the feeds. This might be some minutes later.

AntennaPod has a new URL & gpodder sends just the old one

This is the case that causes trouble. The other cases just occur because that one happens first. Gpodder does not update its urls automatically. They keep the old url.

I also believe that it would make sense persist redirect rules on full syncs and make a lookup on delta syncs.

This issue does not apply to delta syncs. Gpodder does not send us the url again, even if we update to the new url on our end.


I just noticed another issue: If we upload the episode status, we push it to the new URL (because that's what's in the database). For gpodder, the new url is a completely different podcast, which means that all playback events are only sent to the new feed. That renders the old feed unused. Would kind of make sense to delete the old feed on gpodder and add the new one.

hovancik commented 4 years ago

Hi, I have reinstalled app to get 2.0.0-alpha1, so I had to connect to gpodder again (as an existing device).

Some of my podcasts got duplicated.

What I can see (plus some more): https://gpodder.net/podcast/6-minute-english/ https://gpodder.net/podcast/6-minute-english-4

https://gpodder.net/podcast/dark-tome https://gpodder.net/podcast/dark-tome-1

They look a bit different on Gpodder but same in AtennaPod. The first ones are usually me subscribing in 2019 (from gPodder history) and the second ones are from yesterday when I installed AntennaPod again and added it as an existing device.

tonytamsf commented 4 years ago

I have up on gpodder sync on my phone, when it created dup's. Was not worth it for me anymore

alexanderadam commented 4 years ago

Here's a corresponding bountysource issue in case someone wants to sponsor at least a coffee or so for solving this.

Okay, I try to get my money back from BountySource then since this isn't wanted.

alexanderadam commented 4 years ago

~~I'm very sorry but I wasn't aware that BountySource became evil (i.e. stealing unclaimed money and 10% for each charge). Is there anything better & AntennaPod compatible way of sponsoring this bugfix? Or any other advice?~~

Okay, I try to get my money back from BountySource then since this isn't wanted.

keunes commented 4 years ago

Actually, given that there's not issue over at Gpodder's repository yet, and that there's a new maintainer, wouldn't it make sense to create one over there? One could make the (semantic) argument that Gpodder isn't storing the latest url, and that this should be corrected.

ByteHamster commented 4 years ago

This is not really a bug in Gpodder. The problem is that we handle URL redirects differently to Gpodder. They keep the old URL and follow the redirect every time they refresh the feed. This ensures consistency when synchronizing (where the feed is identified by the URL). We follow the redirect one single time and update the URL in the database. This has the advantage of not requiring an additional request each time we update the feeds.

While our method might be a bit more stable when the old server is turned off at some point, theirs also has advantages. I do not think that either one of us should significantly change their data model just because the two projects do not directly fit in this case. We need another approach.

Frenzie commented 4 years ago

While our method might be a bit more stable when the old server is turned off at some point, theirs also has advantages. I do not think that either one of us should significantly change their data model just because the two projects do not directly fit in this case. We need another approach.

If you described Gpodder correctly, that means it's guaranteed to break and its model should be changed to use a unique identifier instead.

alexanderadam commented 4 years ago

I'm sorry, I have the feeling that we discussed this earlier already but what again was the disadvantage of checking the HTTP header for redirect states (i.e. 3XX) on Gpodder syncs except that it will take a bit longer?

PS: and is there any way I could buy you a coffee or so? I have the feeling that I should take away the money from BountySource because of their recent policies.

EDIT: Okay, then I try to get my money back from BountySource since it isn't wanted

keunes commented 4 years ago

While our method might be a bit more stable when the old server is turned off at some point, theirs also has advantages.

I would agree with @Frenzie there. If their implementation is slower and indeed will break if the old server goes off-line, it sounds like there's room for improvement :)

I do not think that either one of us should significantly change their data model just because the two projects do not directly fit in this case. We need another approach.

I don't see an issue in at least bringing up a proposal - whether they want to change is fully their decision of course, and if they don't it'll hopefully lead to a common ground both projects agree on. With my very limited understanding of the technical side, I would say that in any case it wouldn't really be a change of the data model (database architecture) as such - just the url that they store would change. Plus, I think close collaboration & integration with gpodder.net is important as it has the potential to be a great (server-client) combination that is privacy-friendly and open source.

I'm sorry, I have the feeling that we discussed this earlier already but what again was the disadvantage of checking the HTTP header for redirect states (i.e. 3XX) on Gpodder syncs except that it will take a bit longer?

Not sure what you were thinking of, but some related issues: #1393 #3733 #2232

PS: and is there any way I could buy you a coffee or so? I have the feeling that I should take away the money from BountySource because of their recent policies.

Current maintainer doesn't want donations, see #2935. But thanks for considering :)

ByteHamster commented 4 years ago

If you described Gpodder correctly, that means it's guaranteed to break and its model should be changed to use a unique identifier instead.

Internally, they use a unique identifier - but one for each URL, independently of the redirects. The synchronization API is only based on URLs: docs

was the disadvantage of checking the HTTP header for redirect states (i.e. 3XX) on Gpodder syncs except that it will take a bit longer?

It will break synchronization of episode state. Gpodder always identifies podcasts by their original URL, AntennaPod by the new URL.

If their implementation is slower and indeed will break if the old server goes off-line, it sounds like there's room for improvement :)

I do not think they handle duplicates of the same feed very well. If you look at an API call for the search term methodisch inkorrekt, you get many duplicates with different spelling of the same url (listed below). The problem is known to them (https://github.com/gpodder/mygpo/issues/45 https://github.com/gpodder/mygpo/issues/63 https://github.com/gpodder/mygpo/issues/77 https://github.com/gpodder/mygpo/pull/60) and apparently not easy to fix. Also, it does not look like the project is really active, currently. If someone has the time to help fix their database scheme, please do. I am busy with AntennaPod, so I can not do it.

Duplicates:
http://minkorrekt.de/feed/m4a/
http://minkorrekt.de/feed/m4a
http://minkorrekt.de/feed/
http://minkorrekt.de/feed/mp3/
http://minkorrekt.de/feed/opus/
http://minkorrekt.de/feed/ogg/
http://minkorrekt.de/feed/mp3
http://minkorrekt.de/feed
https://minkorrekt.de/feed/m4a/
http://feeds.feedburner.com/Methodischinkorrekt
http://feeds.feedburner.com/methodischinkorrekt

Broken results:
http://bitlove.org/methodischinkorrekt/minkorrekt/feed
http://methodischinkorrekt.wordpress.com/feed/
http://241568.website.snafu.de/wordpress/?cat=4&feed=rss2
http://241568.website.snafu.de/wordpress/?feed=rss2
alexanderadam commented 4 years ago

It will break synchronization of episode state. Gpodder always identifies podcasts by their original URL, AntennaPod by the new URL.

But if AntennaPod knows that "original URLs" are leading to "new URLs" it should be able to map the synchronisation state accordingly anyway, right? :thinking:

rcuocolo commented 4 years ago

This issue has also occurred to my device after performing (unknowingly) a full sync. A workaround to avoid it in the future would be appreciated. Removing duplicates causes issues with removal of all episodes from "both" subscriptions and requires a new subscription and redownloading the appropriate episodes every time, which is not ideal.

tonytamsf commented 4 years ago

Do we have data around how many users use gpodder and AntennaPod? I am just trying to weigh how much effort I personally want to invest into looking at this since I don't use it.

I used gpodder once and I also saw the duplicate issue and it caused a lot of pain to fix for 111 subscription

rcuocolo commented 4 years ago

Same here. The first time I (unknowingly) made the mistake of a full sync which caused duplication. A couple days ago a regular sync (strangely) also presented the issue. It is fairly annoying removing duplicates as it also removes the original subscription loosing any data on played/downloaded/queued episodes.

ByteHamster commented 4 years ago

Do we have data around how many users use gpodder and AntennaPod?

AntennaPod does not collect any analytics. I have no idea how many people use it.

I am just trying to weigh how much effort I personally want to invest into looking at this since I don't use it.

I am not sure if we can cleanly fix this on our end. Gpodder still uses the old URLs, so redirects are a completely different subscriptions for them (meaning that the played state of episodes is also tied to the old URL).

LinAGKar commented 4 years ago

I've also had a full sync readd some deleted subscriptions.

And once, deleting a duplicate also deleted the original, but it usually doesn't.

I am not sure if we can cleanly fix this on our end.

Best idea I can come up is for AntennaPod to keep track of both the original URL and the new URL for each feed, using the former when syncing to gPodder, and the latter when fetching the feed.

KAMiKAZOW commented 4 years ago

Removing duplicates causes issues with removal of all episodes from "both" subscriptions and requires a new subscription and redownloading the appropriate episodes every time, which is not ideal.

Yeah, this turns an otherwise mildly annoying bug into a really bad experience. 😥

hovancik commented 4 years ago

Is it time for AntennaPod sync service? :)

keunes commented 3 years ago

A bit of a late reply @hovancik An AntennaPod sync service would be cool, and has been discussed briefly, but is only feasible if there's a (small) group of people we can find available with the expertise and drive to (help) develop and maintain it :)

laubblaeser commented 3 years ago

Is it time for AntennaPod sync service? :)

Yes, please! I don't know how to build it though...

mcepl commented 3 years ago

I don’t think the problem are duplicates. Sometimes gpodder.net just doesn't unsubscribe at all (I have two such podcasts) https://github.com/gpodder/mygpo/issues/520

alexanderadam commented 3 years ago

I don’t think the problem are duplicates. Sometimes gpodder.net just doesn't unsubscribe at all (I have two such podcasts) gpodder/mygpo#520

I agree that you've a different issue than the people of this issue. :+1: But the cause could basically be the same: since AntennaPod isn't keeping track of former/redirected URLs (so it doesn't remember more than exactly one URL for a single podcast and its episodes) it will most likely only unsubscribe you from the only URL it is storing.

See ByteHamster's description above:

Internally, they use a unique identifier - but one for each URL, independently of the redirects.

mcepl commented 3 years ago

I don’t think the problem are duplicates. Sometimes gpodder.net just doesn't unsubscribe at all (I have two such podcasts) gpodder/mygpo#520

The point of gpodder/mygpo#520 is that I cannot remove some podcasts from the gpodder.net account even using the web interface.

alexanderadam commented 3 years ago

The point of gpodder/mygpo#520 is that I cannot remove some podcasts from the gpodder.net account even using the web interface.

I see, in that case you simply have a totally different issue. :man_shrugging:

pschwede commented 3 years ago

Subscriptions still duplicate in 2.2.1:

https://twitter.com/pschwede/status/1415618002419163137?s=19

Also, unsubscribing them takes ages and sometimes the app crashes.

ByteHamster commented 3 years ago

@pschwede Do you use gpodder? Subscriptions should only duplicate if you manually press the "full sync" button, which is usually not needed. Do they duplicate without pressing the button?

pschwede commented 3 years ago

@pschwede Do you use gpodder? Subscriptions should only duplicate if you manually press the "full sync" button, which is usually not needed. Do they duplicate without pressing the button?

So it probably is a new issue?

ByteHamster commented 3 years ago

If you do not use gpodder or you do not use the "full sync" button, it is a different issue.

b010b0 commented 1 year ago

My task #6709 was closed as a duplicate of this one: Every sync (NOT only if I manually press the "full sync" button) resulted in a new duplicated version of the Nature podcast. If I deleted all copies (which I regularly had to do) from all devices the Nature podcast still came back from the dead on the next sync. There was one other podcast affected, but at least it did not come back after deletion.

sehHeiden commented 6 months ago

I created #7225

I encountered this error. But I never used full sync, and I get the problem even when the sync is not successful!

sehHeiden commented 5 months ago

In addition I removed all podcasts that caused that problem completely from within Antennapod, but one (Breitband) keeps returning and with several dousends of copies each day.

alexanderadam commented 2 months ago

What is the best way of getting rid of duplicated subscriptions? Is there an automated way to do it? Does anyone know a script or is there nowadays even an integrated solution in AntennaPod? And how do you usually identify the "newest"/"correct" subscription?

It looks like one third of my subscriptions are duplicated again.

Screenshot_20240909-161439.png

ByteHamster commented 2 months ago

and how do you usually identify the "newest"/"correct" subscription?

All of them are new. The one that is old is the one on the Gpodder server. That's the problem. The only way I currently know how to fix this is to disconnect from the sync server, delete all content on the sync server, and connect again. However, that is quite risky: If the sync server remembers these deletions, they might propagate to the app.

To have proper sync, we need to wait for the OpenPodcastAPI specification to be complete and implemented. That specification includes support for feed redirects.

To find out which duplicate you probably want to keep, you could sort the subscriptions by the number of episodes played. The one with played episodes is probably the one you want to keep.

alexanderadam commented 2 months ago

The only way I currently know how to fix this is to disconnect from the sync server, delete all content on the sync server, and connect again. However, that is quite risky: If the sync server remembers these deletions, they might propagate to the app.

I'm not syncing with gPodder anymore but I want to clean the current state efficiently.

To find out which duplicate you probably want to keep, you could sort the subscriptions by the number of episodes played. The one with played episodes is probably the one you want to keep.

I just did it like that. I wasn't aware the subscription view has a multi select. That helped a lot!

ByteHamster commented 2 months ago

Some of them I haven't listened to yet.

If you haven't listened to any episode, it does not matter which duplicate you delete