DobyTang / LazyLibrarian

This project isn't finished yet. Goal is to create a SickBeard, CouchPotato, Headphones-like application for ebooks. Headphones is used as a base, so there are still a lot of references to it.
728 stars 72 forks source link

Allow prefix matches from files on disk #1587

Closed knobunc closed 5 years ago

knobunc commented 5 years ago

I have files on disk that look like: The Great Bridge_ The Epic Story of the Building of the Brooklyn Bridge

And I want them to match the DB entry: The Great Bridge

This changes the code to allow an explicit prefix match to work when there is a subtitle present of one of the the forms: Title: Subtitle Title (Subtitle)

knobunc commented 5 years ago

I'm not sure about this one. But it's really annoying when the file has a subtitle and the db doesn't.

philborman commented 5 years ago

I'm not sure either, might lead to false positives where the book prefix is the same, will need to be very careful, have come across similar issues before, I think the "Star Wars" series and "Wheel of Time" were particularly difficult, would need to check. I will run it on my library and see if it throws up any issues, might be a day or two.

There are a few other options, not sure if any of them are any better as a solution...

  1. What do you get for the various fuzzy ratios on The Great Bridge_ The Epic Story of the Building of the Brooklyn Bridge I would think the partname ratio should be fairly high, 90? Would tweaking the configured match ratios help?

  2. If you edit the .opf file (it's plain text) you can force a 100% match, either match the title, or add the goodreads id, or the isbn to match the database entry

  3. You could edit the database entry from the lazylibrarian manual edit page to force the title to match

philborman commented 5 years ago

Ok, so ran my tests overnight and hit a problem, not sure what we can do about it though. Here are some of the P.G.Wodehouse "Plum" books in my database... Plum Punch: Life at Home Plum Punch: The Game's the Thing Plum Punch Plum Punch: To Marry or Not To Marry Plum Punch: School Days Plum Punch: Crime and the Courts

I actually have all of these books in my library, but some are not recognised on importing P.G.Wodehouse as the language is "unknown". The libraryscan tries to match the name ignoring the language filter as the book exists, but the import maps them all to "Plum Punch"

knobunc commented 5 years ago

Hm. I had the match as the least preferred, so I would have thought the others would have hit first.

I guess this is a good place to put some unit testing so we can see how it behaves with different names. I may poke at that eventually.

philborman commented 5 years ago

Yeah, I've never done unit testing, I'm not a programmer, just a hobby as you can probably tell!

philborman commented 5 years ago

ok, the Wodehouse problem was my end. I deleted my test database and ran it from clean and the fuzzy matching worked fine, didn't hit the prefix code at all (as expected). The only one that hit prefix was Fuzz match prefix [Brilliant Blunders: From Darwin to Einstein - Colossal Mistakes by Great Scientists That Changed Our Understanding of Life and the Universe] [Brilliant Blunders] which is correct. I will merge this patch in and we can see how it goes. If it creates problems we could always make it switchable in config?

knobunc commented 5 years ago

Awesome. The other thing I realized is that we have the subtitle in the db. Perhaps we could try a match with that as well as with the bare name.