advplyr / audiobookshelf

Self-hosted audiobook and podcast server
https://audiobookshelf.org
GNU General Public License v3.0
6.34k stars 447 forks source link

[Enhancement]: Add audioteka.com.pl as metadata provider #3105

Closed izikeros closed 3 months ago

izikeros commented 3 months ago

Type of Enhancement

Server Backend

Describe the Feature/Enhancement

There is a service (Audioteka) that provides wide collection of audiobooks and has metadata for them. In audiobookshelf there are already providers of metadata, covers - this would be another possible source of metadata.

Why would this be helpful?

It would complement existing sources since it can provide metadata in Polish language. Use case: user has a collection of polish audiobooks (e.g. titles are in polish) and would like to enrich the library with metadata

Future Implementation (Screenshot)

I have some context info to AI and generated the code that has some chance to work. I'm not a javascript programmer and can't run/debug it. This might be some starting point.

const axios = require('axios').default
const cheerio = require('cheerio')
const Logger = require('../Logger')

class AudiotekaProvider {
  #responseTimeout = 30000

  constructor() {}

  /**
   * Search for an audiobook on audioteka.com.pl
   * @param {string} title
   * @param {string} author
   * @param {string} isbn
   * @param {string} providerSlug
   * @param {string} mediaType
   * @param {number} [timeout] response timeout in ms
   * @returns {Promise<Object[]>}
   */
  async search(title, author, isbn, providerSlug, mediaType, timeout = this.#responseTimeout) {
    if (!timeout || isNaN(timeout)) timeout = this.#responseTimeout

    const encodedTitle = encodeURIComponent(title)
    const url = `https://audioteka.com/pl/search?query=${encodedTitle}`

    try {
      const response = await axios.get(url, { timeout })
      const $ = cheerio.load(response.data)

      const results = []
      $('.product-tile').each((index, element) => {
        const productUrl = $(element).find('a').attr('href')
        results.push(this.scrapeAudiobookDetails(productUrl))
      })

      return Promise.all(results)
    } catch (error) {
      Logger.error('[AudiotekaProvider] Search error', error)
      return []
    }
  }

  /**
   * Scrape audiobook details from a specific URL
   * @param {string} url
   * @returns {Promise<Object>}
   */
  async scrapeAudiobookDetails(url) {
    try {
      const response = await axios.get(url, { timeout: this.#responseTimeout })
      const $ = cheerio.load(response.data)

      const jsonLd = JSON.parse($('script[type="application/ld+json"]').html())

      const title = jsonLd.name
      const authors = jsonLd.author.split(', ')
      const narrator = jsonLd.readBy
      const publisher = jsonLd.publisher
      const publishedYear = new Date(jsonLd.datePublished).getFullYear()
      const description = $('article p').text().trim()
      const cover = jsonLd.image

      return {
        title,
        subtitle: null,
        author: authors.join(', '),
        narrator,
        publisher,
        publishedYear,
        description,
        cover,
        isbn: null,
        asin: null,
        genres: null,
        tags: null,
        series: null,
        language: 'pl',
        duration: null
      }
    } catch (error) {
      Logger.error('[AudiotekaProvider] Scraping error', error)
      return null
    }
  }
}

module.exports = AudiotekaProvider

Audiobookshelf Server Version

v2.10.1

Current Implementation (Screenshot)

None

izikeros commented 3 months ago

There are guys that tried scrap the data, and produce OPF file in #602

nichwall commented 3 months ago

Duplicate of https://github.com/advplyr/audiobookshelf/issues/2598

Can you provide public API documentation? If there is not a public API (requiring scraping the web page within ABS), this would fit better as a custom metadata provider instead of being within ABS.

https://www.audiobookshelf.org/guides/custom-metadata-providers