VocalPodcastProject / vocal

A powerful, beautiful, and simple podcast client for the modern free desktop.
GNU General Public License v3.0
346 stars 49 forks source link

Update database schema & primary keys. #424

Closed psetq closed 4 years ago

psetq commented 4 years ago

This is a simplified version of #379 as a result of the discussion on that PR thread.

This PR primarily updates the database schema to use the podcast uri to uniquely identify podcasts and the podcast uri and episode guid to uniquely identify episodes, addressing the problem identified in #320

This change also uses the existing export/import methods to migrate existing subscriptions. Note that this does not copy episode data directly, which could appear to result in loss of episodes no longer shown in their feeds . Note that the actual episode data is retained so technically recovery is possible though complex (as in the #379 example).

This is a minimized changeset, and should be a work in progress.

nathandyer commented 4 years ago

Hi @psetq,

Great work as always, thanks for the PR. I've looked through the code and it all looks good. As you mentioned above I realize this is essentially a WIP so I won't merge yet, but from the code side it looks good.

I think using the podcast uri to identify podcasts is the right way to do it, and will indeed fix #320. It's also nice because that's the way the gpodder API works (it identifies all feeds by URI), so that's now consistent across everything.

I've read back through the thread on the PR we had open a while ago (#379), and I see that the guid is based on a combination of the episode's uri and the title of the episode (in order to uniquely identify it). I can't seem to remember why we were thinking to go this route. Wouldn't the episode's URI uniquely identify it? I believe in the original example we had a few places where the URI was empty and therefore it thought there were duplicates, but in that case it seems to me we shouldn't allow episode entries with no URI. To refer back to the gpodder API (which is heavily on my mind since I've been working with it so much lately), it uses the episode's uri to unique refer to each episode. I'm just wondering if it makes sense to use something that would be specific only to Vocal itself, when perhaps a simpler and more consistent alternative is out there. What are your thoughts on using the episode's URI instead of the guid?

psetq commented 4 years ago

Hi @nathandyer ,

The "fake guid" was left out of this PR. The guid field is, per the RSS spec, "A string that uniquely identifies the item". That said, it's clear that not all feeds provide episode guid data. The atom spec uses the entry > id, though it is not required to be unique per feed (addressed below).

The manufactured guid values you mention (url+title) were only a way to fill in that data where it didn't exist, i.e. episodes that exist in the database but are no longer found in the feed. This isn't necessary in this pr since only the subscription, not episode data, is migrated over to the new database.

Regarding the "uri" (actually the enclosure url); as we saw in the other comment thread, podcast RSS feeds sometime include entries without associated media (text-only podcast anouncements, updates, etc). I don't see any reason not to continue to show these in the feed as the authors inteded. To add to the confusion, the episode link (saved as the "uri") can actually appear more than once in a feed at different times (as a "re-run" or whatever they call it).

My latest change in this branch also starts also saving the episode link data from the feed. As not all podcasts supply an episode guid, but together the link & guid data should more reliably uniquely identify episodes (Or some most-likely-unique combination of podcast_uri, episode uri, guid and link)

nathandyer commented 4 years ago

@psetq I had to ponder everything for a day or two (then naturally forgot to come back and update this thread since the notification cleared - whoops, sorry about that! :smiley:). Those are reasonable concerns about why it makes sense to go with the guid for episodes instead of relying only on the URI. I'm curious how a text-only episode (like an announcement) would work in the rest of the app, which expects there to be a media file available. That being said, it's worth doing the right way up front, so hopefully if there are any places in the app that aren't handling that situation correctly we can find it and get it fixed.

What all is left to do before merging into master? And would it still require a "fresh start" to existing users' libraries?

psetq commented 4 years ago

Hi,

The text-only episodes would continue to appear exactly like they do now. They're only mentioned here because of the implications of using the missing media uri. This may need to be addressed but I'd suggest that be a separate issue.

This mod also migrates the user subscriptions so it's not exactly a "fresh start", with the caveat above re: database episodes that may no longer be available in the feed will appear to the user to go missing.

I believe this can be merged as is, though I'd suggest more testing before release.

Edit to add:

I do want to highlight the "missing episode" issue I commented above so it's not overlooked, this could be an issue for some users and although the data is not deleted no method for recovery is provided (a manual downgrade should be fairly simple).

nathandyer commented 4 years ago

@psetq Awesome, thanks so much for the clarification. I'll test it out a bit more and merge it in soon.

nathandyer commented 4 years ago

Just a note that I've tested this out, and everything seems to be working okay. I'm going to go ahead and merge it into master now before our 3.0 release, just to make sure it plays nicely with all the major new changes (and fix what needs to be fixed).

@psetq I just wanted to acknowledge the "missing episode" problem you have described, and let you know that I understand the issue. I don't really see a way around it, so in the release notes I'll be sure to mention to users that in rare circumstances where an episode is no longer available on a feed, it will go missing from their Vocal library, and to temporarily downgrade if necessary. Thank you again for this tremendous work!