cbanack / comic-vine-scraper

An add-on script for ComicRack that lets you copy details from Comic Vine into your comic books.
244 stars 47 forks source link

"Covers table" as string not filtered in scraped summaries #440

Closed Xelloss-nakama closed 6 years ago

Xelloss-nakama commented 7 years ago

When a comic in comicvine has a "covers table" info (a table with a list of covers), - which I think is something new in the site... - this data is scrapped with the summary (as text, which looks as garbage text at the end of the field)

Example: http://comicvine.gamespot.com/ant-man-larger-than-life-1/4000-493036/

scraped Summary:

Join Hank Pym, a.k.a. ANT-MAN, as he learns that a little experiment can land him in BIG trouble!
Can Ant-Man tame something as painfully formidable as the Bullet Ant before it’s too late?! Find out!

PLUS: Reprinting Tales to Astonish #27 (the first appearance of Hank Pym) and Tales to Astonish#35 (the first of appearance of Ant-Man in costume)!List of covers and their creators:CoverNameCreator(s)Sidebar LocationRegRegular CoverJung-Sik Ahn1REFour Color Grails Exclusive VariantMike Deodato2VarBlank Variant CoverNone3

instead of:

Join Hank Pym, a.k.a. ANT-MAN, as he learns that a little experiment can land him in BIG trouble!
Can Ant-Man tame something as painfully formidable as the Bullet Ant before it’s too late?! Find out!

PLUS: Reprinting Tales to Astonish #27 (the first appearance of Hank Pym) and Tales to Astonish#35 (the first of appearance of Ant-Man in costume)!
Xelloss-nakama commented 7 years ago

It would be also cool to scrap this data somehow, but in a correct way... (such as formated text in the summary, or custom values with cover information, or just tags with this info)

cbanack commented 7 years ago

More details here: http://comicrack.cyolito.com/forum/32-news-and-announcements/33534-comic-vine-scraper?start=1180#46062

cbanack commented 6 years ago

Fixed in 1.0.94.

It's tough to keep the summary data clean, because it is freeform html, and translating that into sensible text is hard. I opted to play it safe and leave everything the way it was, except I stripped out the words "List of covers" and anything that occurs after that.

It's not perfect, but it does seem to work pretty well. Fortunately, the Comic Vine editors seem pretty consistent about using that phrasing.