advplyr / audiobookshelf

Self-hosted audiobook and podcast server
https://audiobookshelf.org
GNU General Public License v3.0
5.74k stars 395 forks source link

[Enhancement]: Podcast RSS Feed Parse Chapters From Description #2363

Open Migz93 opened 7 months ago

Migz93 commented 7 months ago

Describe the feature/enhancement

I'm currently using Podcast Addict as my main podcast application but would love to move over to using ABS not just for downloading/storing my podcasts but also for listening.

I noticed that in Podcast Addict chapters appear for my main podcast "NoSleep" but don't appear in ABS. After digging through the RSS feed and the id3 tags of the actual mp3 it doesn't look like chapters are included in either. A bit more googling and it seems that Podcast Addict actually parses the episode descriptions and provides time stamps from that. Link to NoSleep episode and 2 other episodes that all have timestamps in the description:

  1. https://www.listennotes.com/podcasts/the-nosleep-podcast/s20-ep7-nosleep-podcast-s20e07-siYaVjJktNx/
  2. https://www.listennotes.com/podcasts/the-wan-show/dbrand-x-casetify-lawsuit-NYXWLoceC2x/
  3. https://www.listennotes.com/podcasts/nerdwallets-smart/is-your-personal-finance--_fc5gUq0Ow/
advplyr commented 4 months ago

This just got brought up again in #1113 so I did a quick search on if there are any standards for chapters in the description that could be used to build a parser. It doesn't appear that there is but I did find this article saying that Spotify only accepts chapters in that way. https://james.cridland.net/blog/2023/spotify-chapters-kludge/

All the other podcast platforms expect chapters to be in ID3 tags which is the same for Abs.

I'm not sure how the nosleep podcast episode you linked could be parsed when it doesn't follow the format of <timestamp> <title> like the others. I think it would be too error prone if we built a chapter parser that pulled any line with a timestamp and treated it as a chapter. The WAN show episode you linked seems like the best format to start with if we support this. The NerdWallet one maybe but I don't know how they would differentiate between what is the description and what is the title. Some titles may have a colon in them so what then?