[Enhancement]: Support more m4b tags

DarkShortie commented 2 years ago

Describe the feature/enhancement

I prefer storing all metadata in the file itself instead of using "companion files" or folder / filenames and have a whole audiobook in one file.

Therefore I'm using the m4b format and I have activated the setting "Scanner prefer audio metadata".

Then I tried to to set the metadata according to the mp4 metadata fields and what is described here https://www.audiobookshelf.org/docs#book-audio-metadata

I've also set some additional tags which are used by the aaxc format from Audible (which is also just a mp4 container) and it would be great if those custom tags could also be supported by audiobookshelf.

Not all tags are working, is there a list which mp4 tags are mapped to which audiobookshelf data field? For the mp4 tags it would be could to know the atom names instead of what is displayed in different tools because the friendly name also differs between the different tools.

This is are my results and enhancement requests so far:

audiobookshelf	mp4 atom	Note
Author	aArt	working
Title	©alb	working
Subtitle	subt	not working, working atom available? If not please implement, custom atom
Publisher	cprt	not working, working atom available? If not please implement, offical atom
Publisher	©pub	custom atom used by Audible aaxc, please implement
Publish Year	©day	working
Publish Year	rldt	custom atom used by Audible aaxc, this contains a complete date in the format yyyy-mm-dd and could enhance the data in audiobookshelf
Narrator	©wrt	working
Narrator	©nrt	custom atom used by Audible aaxc, please implement
Description	©des	not working, working atom available? If not please implement, offical atom
Genres	©gen	working
Series	tvsh	not working, working atom available? If not please implement, custom atom
Volume Number	tves	not working, working atom available? If not please implement, custom atom
Language	lang	not working, working atom available? If not please implement, custom atom
ASIN	asin	not working, please implement, custom atom

advplyr commented 1 year ago

We are going to be switching our meta tag reader & writer from ffmpeg to tone. If you can test your m4b audio files are read correctly using tone that would be helpful. https://github.com/sandreas/tone

If you install the CLI then you can just tone dump ./path/to/audiobook.m4b and see if it gets everything you want.

sandreas commented 1 year ago

@DarkShortie @advplyr

For series and volume number usually the tags movementName and movementIndex are used.

Unfortunately, these are not yet supported by ffprobe and ffmpeg, but today I submitted an enhancement issue to address this (https://trac.ffmpeg.org/ticket/10269#ticket). I don't have too high hopes, because the ffmpeg team has much to do and issues I submitted in the past did not gain too much interest, but the source code is pretty straight forward and linked in the issue, maybe a happy ABS user with good C coding skills can submit a PR.

Let's hope for the best.

sandreas commented 1 year ago

@DarkShortie @advplyr I found out, that tagging m4b files with ----:com.pilabor.tone:SERIES=Harry Potter will make ABS detect the series correctly. This solved A LOT of my problems organizing my audiobook collection. I plan to integrate ----:com.pilabor.tone:SERIES = <Movement> into the M4bFillupTagger for tone v0.1.6 to ensure, that ABS detects the series correctly. Currently you have to use a custom javascript tagger to write this field, it does unfortunately not work via --meta-additional-field

The only remaining problem (that I also solved, but it was kind of an effort) is that ABS still requires a directory per audiobook, while I would be nice have a setting: 1 audiobook per file for the following extensions: m4b

Whatever, at least I no longer need the Movement tag :-)

advplyr commented 1 year ago

@sandreas What about SERIES-PART?

Also, you don't need to have each audiobook in a separate folder if the audio files are in the root. I'm guessing your folder structure has m4b files in sub folders?

sandreas commented 1 year ago

@sandreas What about SERIES-PART?

@advplyr SERIES-PART is not as important to me, because it is only additional information and not used for grouping audiobooks together. However I already use a custom tag ----:com.pilabor.tone:PART as custom tag, because Movement is an integer, while PART can also contain strings like 2.1 or VI (only to name a few) and the part number is part of the sort_title and the longdescription in my audiobooks.

Also, you don't need to have each audiobook in a separate folder if the audio files are in the root. I'm guessing your folder structure has m4b files in sub folders?

My folder structure is this (i reorganized because of ABS...):

/series/%genre/%author/%series-name/%part/%part - %title/%part - %title.m4b - for series with part
/series/%genre/%author/%series-name/%title/%title.m4b - for series without part (especially kids audiobooks are organized that way)
/individuals/%genre/%author/%title/%title.m4b - for individual titles (no series)

For the series I would love to remove the last directory to group series together:

/series/Fantasy/J.K. Rowling/Harry Potter/1/1 - Harry Potter and the Philosophers Stone.m4b

But this is no necessity any more, since my whole workflow supports subfolders right now and I don't see them very often :-)

I use this structure, because every now and then an audio book that was an individual changes to a series and in this case i have to move the book and retag it via folder structure (often the Offical series name is not my preferred one):

tone tag --path-pattern='series/%g/%a/%s/%p/%p - %n/%p - %n.m4b' --path-pattern='series/%g/%a/%s/%n/%n.m4b' --path-pattern='individuals/%g/%a/%z/%n.m4b' "audiobooks/" --taggers="remove,*" --meta-remove-additional-field="©mvc" --meta-remove-additional-field="----:com.apple.iTunes:iTunSMPB" --order-by="!created" --limit=50 -y --prepend-movement-to-description

advplyr commented 1 year ago

SERIES-PART is not as important to me, because it is only additional information and not used for grouping audiobooks together. However I already use a custom tag ----:com.pilabor.tone:PART as custom tag, because Movement is an integer, while PART can also contain strings like 2.1 or VI (only to name a few) and the part number is part of the sort_title and the longdescription in my audiobooks.

So how is Abs populating your series sequence if you are only using the SERIES tag?

sandreas commented 1 year ago

So how is Abs populating your series sequence if you are only using the SERIES tag?

It doesn't. But as long as it gets the series right, it is much easier to navigate, since I can collapse the series in the frontend.

However, I would love to see the following fallback option:

If SERIES-PART is defined, prefer it and sort
If PART is defined, take it and sort
If sort_name or sort_album is defined sort by this
If sort_name / sort_album != title / album and ends with title / album, look for a Part-Number at the end of the prefix of the sort title
- Example: name=Harry Potter and the philosophers stone, sort_name=Harry Potter 1 - Harry Potter and the philosophers stone

@advplyr Here, feel free to use and adjust to your needs:

let metas = [
  {
    name: "Harry Potter and the philosophers stone",
    sortName: "Harry Potter 1 - Harry Potter and the philosophers stone",
    series: "Harry Potter",
    seriesPart: "1",
    part: "1"
  },
  {
    name: "Harry Potter and the philosophers stone",
    sortName: "Harry Potter 1 - Harry Potter and the philosophers stone",
    series: "Harry Potter",
    part: "1"
  },
  {
    name: "Harry Potter and the philosophers stone",
    sortName: "Harry Potter 1 - Harry Potter and the philosophers stone",
    series: "Harry Potter"
  },
  {
    name: "Harry Potter and the philosophers stone",
    sortName: "Harry Potter 1 - Harry Potter and the philosophers stone",
  },
];

function getSeriesAndPart(meta) {
  let series = meta.series || null;
  let part = meta.seriesPart || meta.part || null;
  // if series is set or sortName does not fulfill specific criteria, return early
  if(series !== null 
     || !meta.sortName 
     || meta.name === meta.sortName 
     || !meta.sortName.endsWith(meta.name)) {
    return [series, part];
  }

  // extract prefix (in our case: "Harry Potter 1 - ") and rtrim unwanted chars
  let sortNamePrefix = meta.sortName.slice(0, meta.name.length*-1).replace(/[ :-]+$/, "");

  let splittedPrefix = sortNamePrefix.split(' ');
  if(splittedPrefix.length > 1) {
    // try to extract part number from last word
    let potentialPartNumber = splittedPrefix.slice(-1)[0];

    // part number can contain numbers and dot (e.g. 2.1 would also be valid)
    if(potentialPartNumber.match(/^[0-9.]$/)) {
      part = splittedPrefix.pop();
    }
  }
  // series is the non number part of the prefix, if there is no number, the whole prefix is returned
  series = splittedPrefix.join(' ');
  return [series, part];
}

console.log(getSeriesAndPart(metas[3]))

sandreas commented 1 year ago

@advplyr any feedback?

advplyr commented 1 year ago

If SERIES-PART is defined, prefer it and sort

If PART is defined, take it and sort

Have you checked if ffprobe is able to detect the PART tag? I can add that to the list of tags we look for for seriespart https://github.com/advplyr/audiobookshelf/blob/master/server/utils/prober.js#L186

If sort_name or sort_album is defined sort by this

If sort_name / sort_album != title / album and ends with title / album, look for a Part-Number at the end of the prefix of the sort title

I haven't seen the tags sort_name and sort_album used before. We're currently not using any separate sort fields but there is a request open for title sort #1074. I haven't added it yet since the UI is getting overwhelmed with so many fields I wanted to see if it can be simplified.

I do have ffprobe looking for title-sort, album-sort and artist-sort https://github.com/advplyr/audiobookshelf/blob/master/server/utils/prober.js#L166 but they aren't used yet.

The actual sorting of the series books doesn't happen in the way you are thinking. When we populate the books the in database we are storing each book with BookMetadata object https://github.com/advplyr/audiobookshelf/blob/master/server/objects/metadata/BookMetadata.js

The sorting of the series happens from the API. Here we sort the series books by series sequence first and fallback to using the book title if the sequence is not defined.

Just to be complete in my response we are also sorting series here except here I didn't implement the title fallback. This is used in the API request that requests all series

sandreas commented 1 year ago

@advplyr Thank you for this detailed answer and sorry if I was a bit pushy on this :-) This series and part problem in ffprobe / ffmpeg is really annoying to me, it's not your fault ;)

Have you checked if ffprobe is able to detect the PART tag? I can add that to the list of tags we look for for seriespart https://github.com/advplyr/audiobookshelf/blob/master/server/utils/prober.js#L186

Yes, it does support PART in m4b files (see Harry Potter 1 example below).

I haven't seen the tags sort_name and sort_album used before. We're currently not using any separate sort fields but there is a request open for title sort https://github.com/advplyr/audiobookshelf/issues/1074.

ffprobe does support sort_name and sort_album, as well as sort_composer, sort_artist and sort_albumartist. Even iTunes does support most of these in an extra tab - so you can sort your audio books correctly if the titles are not in alphabetical order and the PART tag is not set (e.g. audiobook series for children often don't have part numbers at all but only the series-name)

I do have ffprobe looking for title-sort, album-sort and artist-sort https://github.com/advplyr/audiobookshelf/blob/master/server/utils/prober.js#L166 but they aren't used yet.

Yeah title-sort, album-sort and artist-sort are synonyms for the same tags often used in mp3... Either it is a dash-suffix *-sort, underscore suffix *_sort or prefix sort-*, sort_, these should be all the same information. ffmpeg is not totally strict about this. Most commonly used is sort_album, sort-album and album-sort and the same for *title*.

The sorting of the series happens from the API. Here we sort the series books by series sequence first and fallback to using the book title if the sequence is not defined.

I know... I was just curious, if it would be possible to guess the PART of the series using existing fields like sort_album and store it as sequence number while indexing. Since ffprobe is not supporting MovementName and MovementIndex (and probably won't in the near future), I thought it would be a nice fallback to extract the sequence out of the sort_album if possible.

[ ] Support for the PART tag would be enough for the moment :-) See PR #1750

Example: Harry Potter 1 (german)

ffprobe -v quiet -print_format json -show_format  1\ -\ Harry\ Potter\ und\ der\ Stein\ der\ Weisen.m4b


{
    "format": {
        "filename": "1 - Harry Potter und der Stein der Weisen.m4b",
        "nb_streams": 3,
        "nb_programs": 0,
        "format_name": "mov,mp4,m4a,3gp,3g2,mj2",
        "format_long_name": "QuickTime / MOV",
        "start_time": "0.000000",
        "duration": "34404.179000",
        "size": "278246539",
        "bit_rate": "64700",
        "probe_score": 100,
        "tags": {
            "major_brand": "isom",
            "minor_version": "512",
            "compatible_brands": "isomiso2mp41",
            "artist": "J. K. Rowling",
            "title": "Harry Potter und der Stein der Weisen",
            "album": "Harry Potter und der Stein der Weisen",
            "genre": "Fantasy",
            "composer": "Rufus Beck",
            "copyright": "Pottermore Publishing",
            "description": "Harry Potter 1: Rufus Beck liest Band 1 von Harry Potter.  Eigentlich hatte Harry geglaubt, er sei ein ganz normaler Junge. Zumindest bis zu seinem elften Geburtstag. Da erfährt er, dass er sich an der Schule für Hexerei und Zauberei einfinden soll. ...",
            "sort_name": "Harry Potter 1 - Harry Potter und der Stein der Weisen",
            "sort_album": "Harry Potter 1 - Harry Potter und der Stein der Weisen",
            "synopsis": "Harry Potter 1: Rufus Beck liest Band 1 von Harry Potter.  Eigentlich hatte Harry geglaubt, er sei ein ganz normaler Junge. Zumindest bis zu seinem elften Geburtstag. Da erfährt er, dass er sich an der Schule für Hexerei und Zauberei einfinden soll. Und warum? Weil Harry ein Zauberer ist. Und so wird für Harry das erste Jahr in der Schule das spannendste, aufregendste und lustigste in seinem Leben. Er stürzt von einem Abenteuer in die nächste ungeheuerliche Geschichte, muss gegen Bestien, Mitschüler und Fabelwesen kämpfen. Da ist es gut, dass er schon Freunde gefunden hat, die ihm im Kampf gegen die dunklen Mächte zur Seite stehen.",
            "comment": "Harry Potter 1: Rufus Beck liest Band 1 von Harry Potter.  Eigentlich hatte Harry geglaubt, er sei ein ganz normaler Junge. Zumindest bis zu seinem elften Geburtstag. Da erfährt er, dass er sich an der Schule für Hexerei und Zauberei einfinden soll. Und warum? Weil Harry ein Zauberer ist. Und so wird für Harry das erste Jahr in der Schule das spannendste, aufregendste und lustigste in seinem Leben. Er stürzt von einem Abenteuer in die nächste ungeheuerliche Geschichte, muss gegen Bestien, Mitschüler und Fabelwesen kämpfen. Da ist es gut, dass er schon Freunde gefunden hat, die ihm im Kampf gegen die dunklen Mächte zur Seite stehen.",
            "date": "2016-10-17T22:00:00",
            "encoder": "m4b-tool",
            "purchase_date": "2009/09/09",
            "media_type": "2",
            "gapless_playback": "1",
            "PART": "1",
            "AUDIBLE_ASIN": "B01M02FJ7A",
            "SERIES": "Harry Potter"
        }
    }
}

advplyr / audiobookshelf

[Enhancement]: Support more m4b tags #787

Describe the feature/enhancement

Example: Harry Potter 1 (german)