Open NitzanNougat opened 4 months ago
Just to be clear - this has nothing to do with csv. Audiobookshelf doesn't import metadata from csv files. The metadata is usually read from the audio file itself (or from some other sources supported by ABS, which don't include csv). Libation by default embeds the metadata into the audio file (this is controlled in Libation by Settings -> Audio File Settings -> Allow Libation to fix up audiobook metadata).
Anyway, I did reproduce the behavior you describe, and I'll try to fix it.
Comma was intentionally left out when I set this up a few years ago. I believe that some genres from Audible have commas in them so if we split on comma then it would break those genres. We should confirm this before adding comma, it may not actually be an issue but I remember intentionally leaving comma out.
Found an example: https://api.audnex.us/books/B01CUKULGA
"genres": [
{
"asin": "18574597011",
"name": "Mystery, Thriller & Suspense",
"type": "genre"
},
{
"asin": "18580606011",
"name": "Science Fiction & Fantasy",
"type": "genre"
},
{
"asin": "18574621011",
"name": "Thriller & Suspense",
"type": "tag"
},
]
We cannot ignore, though, a quite significant data source (Libation), that seems to always put commas between genres.
Between getting all Libation multi-genre tags wrong (which also pollutes the genres data in ABS), and sometimes splitting a genre mistakenly, the latter seems preferable.
But let me first try to think if there's some heuristic that will let us eat the cake and leave it whole.
On Mon, Jul 8, 2024, 17:13 advplyr @.***> wrote:
Found an example: https://api.audnex.us/books/B01CUKULGA
"genres": [ {"asin": "18574597011","name": "Mystery, Thriller & Suspense","type": "genre" }, {"asin": "18580606011","name": "Science Fiction & Fantasy","type": "genre" }, {"asin": "18574621011","name": "Thriller & Suspense","type": "tag" }, ]
— Reply to this email directly, view it on GitHub https://github.com/advplyr/audiobookshelf/issues/3127#issuecomment-2214195079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMDFVST3U4RF3Q25I6EMK3ZLKNAXAVCNFSM6AAAAABKOUF3K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUGE4TKMBXHE . You are receiving this because you commented.Message ID: @.***>
This issue has been brought up before with Libation https://github.com/advplyr/audiobookshelf/issues/2539 I've never used it before, maybe they have an option to not use comma?
Even though there is no official spec for delimiters on multiple genres it is pretty widely adopted the ones we use and I'm not sure of any meta tagging software that supports comma.
As far as data sources go I would guess Audible is the vast majority. I'm not opposed to supporting comma delimiters if it can be non-disruptive but certainly not a bug.
Related https://github.com/advplyr/audiobookshelf/issues/1864 https://github.com/advplyr/audiobookshelf/issues/1998
So, just to have some data points about this: Audible has a page that shows it's Level 1 and 2 categories (which are used as genres in metadata). These aren't all the genres since there are also some lower level categories that don't appear in this page, but I think it gives some notion of how Audible genres look like. I scraped the data into a Google sheet and ran a couple of stats.
Out of 212 unique genres, 13 contain a comma (~6%). All of the ones containing a comma are of the form "A, B & C".
I'm not sure exactly what to do with this info yet, just wanted to share.
Hi, thanks for the quick reply :)
tbh, I don't mind splitting these unique examples down the middle. For example, for the genre "Fitness, Diet & Nutrition," I'm okay if "Fitness" ends up as a separate genre. It might even help if I'm searching for just "Fitness," as it would show up in that category instead of only under "Fitness, Diet & Nutrition," which might be specific to Audible.
I'm thinking a possible(ugly) idea might just be to check for the unique cases you mentioned, specifically from Audible:
For genres that don't contain one of the unique genres, just use ',' as a regular separator. Regarding the unique genres, maybe remove the substring from the genreTag and then separate by ',' and insert the unique genre later(or something like it but cleaner)?
Thanks!
In the meantime, until this is resolved, running a match in Audiobookshelf with Audible.com as provider will get this fixed for you effortlessly.
I updated to the newest version of ABS and ran a match.
Afterward, I noticed that the genres are still the same. Do I need to delete all the genres and run a match again?
Anyhow, I noticed that book tags are separated by commas, though I didn't check this before the update.
And tbh, searching by tags instead of genres works well enough for me.
In Audiobookshelf Settings, there's an option called "Prefer matched metadata". Turn that option on, and then matching will override existing metadata.
Great it has overriden the previous genres,it didn't split up genres like Mystery, Thriller & Suspense.
fyi i have found only 1 genre that it didn't split up: [Wars & Conflicts, Greece, Civilization] which should be 3 separated genres but that is minor edge case.
Really appreciate the quick help!
@advplyr going back to the original discussion - from my perspective, we're trying to get as much data as possible from the input audio file, with the highest accuracy possible.
With that view in mind, what I'm trying to do is to get genres from Libation-encoded audio file with ~94% accuracy (given the stats we have from the Audible category page), instead of getting them wrong almost every time there's more than one genre. To check this, I looked at the Libation export data from my own Audible library. The library contains 451 books, of which 374 have more than 1 genre. This means that accuracy using the current scanning algorithm would be ~((451-374)/451)=~17%.
So I'm trying to trade 17% accuracy with 94% accuracy. Plus, I'm willing to scrape all genres containing a comma from Audible (I don't think their list of genres is very dynamic), and match against these, so we're a 100% accurate on Libation-encoded books.
Does this make sense?
Great it has overriden the previous genres,it didn't split up genres like Mystery, Thriller & Suspense.
Yes, that's expected. The provider we use returns genres one by one, not as a comma-separated list, so we can tell the genres for sure.
fyi i have found only 1 genre that it didn't split up: [Wars & Conflicts, Greece, Civilization] which should be 3 separated genres but that is minor edge case.
Can you tell me the book name and author for which this happned?
Really appreciate the quick help!
A War Like No Other How the Athenians and Spartans Fought the Peloponnesian War By: Victor Davis Hanson
I think it will be confusing if we split on comma-separated lists but leave the Audible genres with commas. Has anyone opened an issue with this software that is the only one embedding genres with commas? The algorithm should be straightforward with what delimiters we support. I don't mind splitting those Audible genres up personally but we may have other users using commas in their genres. I can ask in the Discord
What happened?
Most of my audiobooks are sourced from Audible by Libation.
When initially downloading these books, I did so without metadata.
However, for those downloaded with metadata, it is in CSV format rather than JSON(I have no idea if that would make a difference).
The issue arrises in both cases with and without metadata:
For example, for the book Cosmos by Carl Sagan, the genres are currently listed as:
Genres: "Astronomy, Cosmology, Biological Sciences, Atmospheric Sciences, Physics"
The entire genre string is treated as a single genre, making it impossible to search for the book by individual genres.
What did you expect to happen?
I would expect the genres to be parsed as individual entries:
Genres: "Astronomy", "Cosmology", "Biological Sciences", "Atmospheric Sciences", "Physics"
Each genre should be recognized as a separate entry, in order to filter or search by genre and get accurate results
Steps to reproduce the issue
Import an Audible audiobook by Libation without metadata/with .csv metadata Install ABS v2.10.1 via docker compose. Scan the new libraries.
Audiobookshelf version
v2.10.1
How are you running audiobookshelf?
Docker
What OS is your Audiobookshelf server hosted from?
Linux
If the issue is being seen in the UI, what browsers are you seeing the problem on?
None
Logs
Additional Notes
No response