advplyr / audiobookshelf

Self-hosted audiobook and podcast server
https://audiobookshelf.org
GNU General Public License v3.0
5.75k stars 395 forks source link

[Enhancement] Support HTML/rich text descriptions for audiobooks #1820

Open Demian98 opened 1 year ago

Demian98 commented 1 year ago

Describe the issue

Metadata fetching from audible is working and a great feature. But there is an issue with the comment field, that contains the description of the book.

The comments fetched from audible are loosing all their line breaks, which makes the text harder to read. Hopefully an easy that can improve the ABS even more :)

Steps to reproduce the issue

  1. Fetch metadata (including comment) for an audiobook from audible.
  2. Compare the text from the audible website with the text in ABS.
  3. You will notice, that the line breaks are missing.

Audiobookshelf version

2.2.22

How are you running audiobookshelf?

Docker

advplyr commented 1 year ago

Related #617

This is because Audible is giving us an HTML description where we only support plain text descriptions right now. When we strip the HTML it removes the formatting.

undaunt commented 1 year ago

I would just mention this would also be great for Podcast descriptions. I'm not sure if the description field for audiobooks and podcasts is the same underlying code. A few of my podcasts use things such as <p> flags in their description. See: "Haunted City - A Blades in the Dark Campaign"

<p>A city bathed in perpetual darkness, and a history not yet written. On the streets of Doskvol, it's kill or be killed as crews vie for power by any means necessary. Haunted City is a dark, twisted romp through a Victorian dystopia, using the rules of arguably the greatest RPG system of the modern era — Blades in the Dark.</p><br /><p>Watch new episodes of Haunted City Wednesdays at 8PM ET on <a href="http://www.twitch.tv/theglasscannon" target="_blank">twitch.tv/theglasscannon</a>. YouTube videos and podcasts drop on Friday. Patreon subscribers can enjoy an ad-free version of the podcast at <a href="http://www.patreon.com/glasscannon" target="_blank">patreon.com/glasscannon</a>.</p><br /><p>Haunted City is an original adventure using the Blades in the Dark game system by Evil Hat Productions.</p><br /><p> Hosted on Acast. See <a target="_blank" href="https://acast.com/privacy">acast.com/privacy</a> for more information.</p>

Or, if not easily supportable, is there a way to strip HTML from podcast feed descriptions?

Thanks! This app is amazing!

ZLoth commented 3 months ago

Somehow, my request to "Associate hyperlinks with an audiobook" got redirected here. What I was hoping to add to some of my books are....

Admittedly, the book's website reference and third party references would apply to non-fiction books such as Great Courses or Modern Scholar.

This could also be extended to author/narriator page including references to that person's own site as well as IMDB and GoodReads.

ZLoth commented 3 months ago

I was previously utilizing Emby as my audiobook manager, and this is one of the features that I missed. I ended up cleaning most of the descriptions from Audible so that the output woud fit properly. (You wouldn't believe how many </b><b> and <p></p> I removed to better format the text.)

The keep thing to remember is that that you have to sanitize the output for at least three scenarios:

As an example, here is the Audible page for NPR Road Trips: Family Vacations - https://www.audible.com/pd/NPR-Road-Trips-Family-Vacations-Audiobook/B00AQ3VPFU : image

Here is the same description after import into Audiobookshelf: image

From my experience, the following tags should be allowed:

I've only seen <span> used on a couple of GraphicAudio titles, and it was usually used to color the text red: <span style="color: #d92027;">ADVISORY: Due to subject matter, this title contains realistically harsh language, including racial epithets and sexual content.</span>

While it's optional, I also converted em-dashes and en-dashes to &mdash; and &ndash; equivelents as well as the double quotes into &ldquo; and &rdquo; respectively.