advplyr / audiobookshelf

Self-hosted audiobook and podcast server
https://audiobookshelf.org
GNU General Public License v3.0
5.73k stars 395 forks source link

Hash audiobooks #2551

Open sevenlayercookie opened 5 months ago

sevenlayercookie commented 5 months ago

Describe the feature/enhancement

Include a hash of audiobooks with the metadata so that if files are moved locations, the metadata will still apply.

Currently: I have a perfectly good library and perfect metadata setup. If I move the files to a different location, the metadata no longer applies.

Enhancement: if the app automatically hashed (MD5, SHA, whatever) every book/file and stored and linked that hash with the metadata, then anytime that specific book is added to the library, the metadata would be applied to it.

Also this hash could be used for other purposes, such as matching chapter data from a community-driven database to audiobooks in ABS. (Plex uses similar strategy for credits markers)

Silther commented 5 months ago

Can you move the books if the metadata is inside the bookfolder?

sevenlayercookie commented 5 months ago

Can you move the books if the metadata is inside the bookfolder?

I haven't tried since I store my metadata in the default centralized location, but I suppose that could be a solution, assuming nothing else changed about the file such as its file name.

nichwall commented 5 months ago

ABS uses the inode to detect moved files. This doesn't work on every filesystem (especially network shares) or if you're performing an operation which copies or deletes the old file since that deletes the inode associated with the file. If you're using filesystem move commands, the inode should be preserved so the updated path is reflected in ABS.

There is some ongoing discussion around improving this functionality.

Hallo951 commented 5 months ago

@nichwall Where is the discussion about improving this function? I would be very interested in an improvement as my audio books are on my NAS and Abs runs on an extra server. This means that the watcher, which is based on this Innode, does not work properly...

nichwall commented 5 months ago

There isn't a good central spot where all of the discussion has happened.

It mostly comes up periodically in Discord (searches for "inode" and "hash" should return results). There was a recent discussion about switching from using the ctime to using the mtime to help with inodes changing on NAS that @FreedomBen was thinking of testing out (references https://github.com/advplyr/audiobookshelf/issues/2509). Near the end of that conversation there was discussion of using hashing in addition to/instead of the inode, but one of the main concerns is how that impacts scan performance for large libraries or network shares.

A few server releases ago, the filename has priority over inode changes in case of the network share, but it doesn't seem to work all the time right now (server version 2.7.2 is currently latest)

Also this issue https://github.com/advplyr/audiobookshelf/issues/1447

advplyr commented 5 months ago

A few server releases ago, the filename has priority over inode changes in case of the network share, but it doesn't seem to work all the time right now (server version 2.7.2 is currently latest)

This should be working. Where did you get the impression it wasn't?

nichwall commented 5 months ago

Specifically this and following comments where only the inode changed, but I could also have gotten confused.

https://github.com/advplyr/audiobookshelf/issues/2509#issuecomment-1890828274

sevenlayercookie commented 5 months ago

Near the end of that conversation there was discussion of using hashing in addition to/instead of the inode, but one of the main concerns is how that impacts scan performance for large libraries or network shares.

I'm far from an expert in hashing etc., but since security isn't an issue and this is simply for convenience, what if instead of hashing the entire file, only portions were hashed? Such as 1 MB from the beginning, 1 MB from the end, and 1 MB from somewhere in the middle? Include file size for good measure. Would be very efficient computationally, and should do a good enough job of preventing collisions.

nichwall commented 5 months ago

... what if instead of hashing the entire file, only portions were hashed? Such as 1 MB from the beginning, 1 MB from the end, and 1 MB from somewhere in the middle? Include file size for good measure.

Well that's a clever idea. Looks like ID3 tags are stored at the end of the file for ID3-v1 and beginning of files for ID3-v2, and the 1MB should get some of the data itself too in case of a reencode (and tags aren't edited). Not sure if every container keeps metadata at the beginning or end of file, but I would assume the file size changing could catch that.

That would probably work for ebooks and other files (cover images, sidecar metadata).

sevenlayercookie commented 5 months ago

I'll experiment within my own library and see if it seems reproducible and avoiding collisions... small sample size.

On another note, I've been experimenting with xxHash on my RPI 4. It's incredibly fast even on this device. Once file loaded in memory, it was running at 1000 MB/s. 100 gb library could be hashed in <2 mins (hard drive speed is the real bottleneck).

ABS could be programmed to only run the hash when the file is already loaded into memory for other reasons (playback) to prevent redundant reads (or when a 'force hash' command is given on the library).

agittins commented 5 months ago

Would audio fingerprinting as used in Picard Musicbrainz and similar apps work well with spoken material? This is a pretty efficient way to identify music at least. The chromaprint fpcalc utility calculates the fingerprint (it took about 3 seconds on a 100mb m4b over nfs, vs 7 seconds for md5sum even after fpcalc had cached some of it), and the resulting fingerprint can be stored in the metadata and used to query / submit to the musicbrainz (or perhaps bookbrainz?) services.

The AcoustID fingerprinting uses characteristics of the recording rather than relying on the exact file or bitstream - so it can recognise a given recording even if it's been transcoded to another format or if the metadata has been edited or stripped.

The web service allows users to collaboratively share metadata and tie together editions, releases, works, authors etc. I suspect the bookbrainz service might be fairly new (I only learned of it while writing this comment) but the whole musicbrainz thing is really well thought-out and makes a huge difference to organising a music collection, I am sure it could be leveraged to do the same for audiobooks, if only it were adopted by more apps. The database is open, and indeed you can download a full dump of their postgresql if one feels the need!

I couldn't see any existing issues specifically pointing to its use (but it was mentioned, perhaps in passing in some other threads I've not read).

sevenlayercookie commented 5 months ago

Would audio fingerprinting as used in Picard Musicbrainz and similar apps work well with spoken material?

I was wondering this too, would be nice to identify editions of audiobooks despite different encodings. But I wonder with how much dynamic range compression and filtering audiobooks get and how non-dynamic spoken word is in general how accurate it would be. Seems worth experimenting with.

nichwall commented 5 months ago

That would probably only work for single file books, since books broken up into multiple files are not consistent (by chapter, fixed length, fixed count, etc).

sevenlayercookie commented 5 months ago

That would probably only work for single file books, since books broken up into multiple files are not consistent (by chapter, fixed length, fixed count, etc).

I believe Musicbrainz/AcoustID/fingerprints the entire sample file vs entire file fingerprint in database, so seems like would be less helpful for multiple file books. However algorithms like Shazam, EchoPrint, Panako excel at matching short segments, which I think would work well with multiple mp3s. Maybe could even align the mp3s with the "gold standard" recording in the database.

Or to go another step, take an audiobook edition with known time stamped chapters, fingerprint the five seconds around every chapter name and then upload those fingerprints to a database, allowing users to run each chapter name fingerprint against their own files, then ABS would timestamp the user's files when a match is made. Seems like that would work regardless of how books are divided among files.

agittins commented 5 months ago

But I wonder with how much dynamic range compression and filtering audiobooks get and how non-dynamic spoken word is in general how accurate it would be.

Spoken word has way more dynamic range than popular music, and probably more than most classical music, as spoken word has gaps between words/sentences, ie silence, which many (most?) forms of music only have periodically, if at all. MusicBrainz still works despite the loudness wars, I doubt dynamic range will be an issue.

What speech does lack is spectral diversity - we humans mostly just honk around 350Hz or so with not a lot of variety compared to many musical sources. This might affect how well the AcoustID algo performs, but that's just my speculation, and pretty low quality speculation at that :-)

As far as I can tell, AcoustID appears to work well for things humans can hear. An hypothesis going counter to that would probably need some evidence.

hashing

Hashing is designed to find identical things, fingerprinting (in this context) is to match similar things.

re how to implement hashing/fingerprinting

re variants

These aren't un-examined possibilities, it's part of the design of musicbrainz (and bookbrainz) that is fairly obvious once you start looking at how they structure things. Chapters/books, tracks/albums, editions/release variants. No new ground here, we don't need to reinvent the wheel.

What I am proposing is that MB/BB already has this problem solved, and ABS could probably implement it, if there's an appetite to do so.

In order of Minimum Viable Products, the features could look like:

Only the first step is required to make this a usable feature. The second two can be performed by Picard musicbrainz or other client tools that already exist.

MusicBrainz already has an official style guide for audiobook metadata and how it should be handled.