cbanack / comic-vine-scraper

An add-on script for ComicRack that lets you copy details from Comic Vine into your comic books.
258 stars 48 forks source link

Cover matching seems to have a false positive #363

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
DESCRIBE THE PROBLEM:

When scraping issue #1 the latest Moon Knight series (2014) with automatic 
cover-matching turned on, the scraper seems to automatically choose the 2011 
series, as though it contained a matching cover.  But when you look at the 
issue #1 covers for these two series, they look completely different!

More details here:

http://comicrack.cyolito.com/forum/32-news-and-announcements/33534-comic-vine-sc
raper-1065-73?start=170#38495

WHAT VERSION OF COMICVINESCRAPER ARE YOU USING?

1.0.74

Original issue reported on code.google.com by cban...@gmail.com on 14 Mar 2014 at 5:08

GoogleCodeExporter commented 9 years ago
here's the application log showing the bad behaviour:

--------------------------------------------------------------------------------
CV Scraper Version:  1.0.74
Running As:          ComicRack Plugin (CR version 0.9.175)
Cache Directory:     C:\Users\aza\AppData\Roaming\Comic Vine Scraper\localCache
Settings File:       C:\Users\aza\AppData\Roaming\cYo\ComicRack\Scripts\Comic 
Vine Scraper\settings.dat
--------------------------------------------------------------------------------

--------------------------------------------------------------------
[X] Series          [X] Volume          [X] Number          
[X] Title           [X] Published       [X] Released        
[X] Crossovers      [X] Publisher       [X] Imprint         
[X] Writer          [X] Penciller       [X] Inker           
[X] Colorist        [X] Letterer        [X] Cover Art       
[X] Editor          [X] Summary         [X] Characters      
[X] Teams           [X] Locations       [X] Webpage         
-------------------------------------------------------------------
[X] Overwrite Existing        [X] Ignore Blanks             
[X] Convert Imprints          [X] Autochoose Series         
[X] Download Thumbs           [X] Preserve Thumbs           
[ ] Confirm Issues            [X] Rescraping: Notes         
[X] Fast Rescrape             [X] Rescraping: Tags          
[X] Summary Dialog            
-------------------------------------------------------------------
Ignore folders when grouping issues into series.
Ignore all series published by 'abril'
Ignore all series published by 'marvel italia'
Ignore all series published by 'panini'
Ignore all series published by 'planeta deagostini'
Ignore all series published by 'marvel uk'
Ignore all series published by 'semic as'
Ignore all series published by 'panini comics'
-------------------------------------------------------------------

======> scraping next comic book: 'Moon Knight (2014) - 001 - [05-2014].cbz'
trying to match this book automatically...
...found a suitable match:  Moon Knight (39957)
searching for the right issue in 'Moon Knight (39957)'
   ...identified issue number 1
querying comicvine for issue details...
setting values for this comic book ('*' = changed):
-->  Series         : Moon Knight
-->  Issue Number   : 1
--> *Title          : Issue #1
-->  Crossovers     : --- skipped ---
--> *Summary        : The wait is over! Moon Knight is here...like you've never 
seen him bef ...
-->  Release Date   : 2011-5-4
--> *Publish Date   : 2011-7-??
--> *Volume         : 2011
-->  Imprint        : --- skipped ---
--> *Publisher      : Marvel
--> *Characters     : Captain America, Count Nefaria, Moon Knight, Mr. Hyde, 
Spider-Man, Ult ...
--> *Teams          : Avengers
--> *Locations      : Los Angeles
--> *Writers        : Brian Michael Bendis
--> *Pencillers     : Alex Maleev
--> *Inkers         : Alex Maleev
--> *Colorists      : Matthew Wilson
--> *Letterers      : Cory Petit
--> *CoverArtists   : Bryan Hitch, Edgar Delgado, Humberto Ramos, Mark Texeira, 
Paul Mounts, ...
--> *Editors        : Axel Alonso, Lauren Sankovitch, Tom Brevoort
--> *Webpage        : 
http://www.comicvine.com/moon-knight-1-issue-1/4000-269480/
-->  Rating         : --- skipped ---
--> *Tags           : CVDB269480
--> *Notes          : Scraped metadata from ComicVine [CVDB269480].
--> *Issue Key      : 269480
--> *Series Key     : 39957
-->  Cover Art URL  : 
http://static.comicvine.com/uploads/scale_small/6/67663/2515260-01a.jpg

Scraper terminated normally (scraped 1, skipped 0).
wrote debug logfile: cvs-debug-log-2014-03-13.txt   

Original comment by cban...@gmail.com on 14 Mar 2014 at 5:10

GoogleCodeExporter commented 9 years ago
To make matters worse, I cannot seem to get this bug to occur on my own 
computer.  Even when I configure everything the same way (as shown in the 
preferences in the application log posted above), on my computer the cover is 
not recognized and I am given the option to choose the correct series.

(This is the 'correct' behaviour, since the scraper never automatically scrapes 
issue 1 of any series if there are multiple available series with the same 
name.  This is to prevent accidentally matching a regular issue 1 to a TPD 
issue 1 with the same cover.)

Original comment by cban...@gmail.com on 14 Mar 2014 at 5:13

GoogleCodeExporter commented 9 years ago
Well, I'm still stumped.   I stepped through the code very carefully, and it 
just seems to be working. :(

So I added a WHOLE BUNCH of debug statements to the code.  Hopefully that will 
help me isolate when the scraper is behaving differently on your end.   Can you 
trying installing this special build, and reproducing the error like you did 
before?

https://docs.google.com/file/d/0BzIVSRpPwBMDQkQwZm8xcUx1Y0k/edit

This time, it should have a lot of extra information.

Original comment by cban...@gmail.com on 17 Mar 2014 at 12:07

GoogleCodeExporter commented 9 years ago
Ok. Here we go. First, my cache seems to be empty, so nothing coming from 
there. I run the custom version you provide, and got the attached log while 
again getting the wrong series... Hope this helps!

Original comment by theotoco...@gmail.com on 17 Mar 2014 at 8:38

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks, that was very helpful.  I've tracked the problem down to a freak 
coincidence where the cover matching algorithm actually thinks these two covers 
are about 85% the same:

http://static.comicvine.com/uploads/scale_large/0/40/3675846-1+moonkn2014001_dc1
1_lr.jpg

http://static.comicvine.com/uploads/scale_large/8/80205/1797285-mk01_cover03.jpg

At first glance, they don't look anything alike, but you have to realize that 
the cover matching algorithm converts them to greyscale, and averages out the 
brightnesses of each region of the cover (this allows it to match covers that 
are similar except for different contrast ratios.)

Anyway, I tweaked the algorithm to be a bit more strict/careful about accepting 
a match.  This eliminates the problem you encountered, and hopefully any 
similar problems in the future.  Please let me know if it doesn't work for you!

The fix is available in Comic Vine Scraper 1.0.75.

Original comment by cban...@gmail.com on 18 Mar 2014 at 3:54