Closed PrinceOfAbyss closed 3 years ago
Thanks for the detailed description 👍🏼 i will fix it.
fixed #167
@PrinceOfAbyss: Seriously, this was by far the best bug report this project ever saw. Thanks a lot!
@bla0r: Again, thanks for taking care. :)
While I was testing some movies, I stumbled upon the extremely rare case where a movie lacks a description...
In that case, something very strange happens... At first, the pattern fails to match the default text that IMDB puts in place of the description (Know what this is about? Be the first one to add a plot.), which has to do with the newline symbols in the <a href markup of the text, so a totally unexpected fallback takes place, where the PCRE engine obviously arbitrarily activates the
s
flag, which then matches a whole block of text from within the page...At this point, please take a careful look at this video I recorded for you, which is almost self explanatory... Notice @2:00 of the video how the scraped text
can be seen at the very beginning of the matched text @1:20 of the video as soon as I deliberately activated the
s
flag.Even if you google the term
csm_body_delivery_started
you will find a whole bunch of results were obviously movies which lacked a description where scraped, and no one noticed the wrong text that was brought by this or similar classes!Now, at the good news, I've come up with the correct pattern that fixes this bug.
'~<section class="titlereference-section-overview">\s+<div>\s*+(.*)\s*?</div>\s+<hr>\s+<div class="titlereference-overview-section">~Uis'
The above pattern matches (as you can see in the video) the description, as well as the text that IMDB puts in place of a description, which prompts the user to enter a description of their own.
After that, all that is left is that you preg_match something like
Know what this is about?
orhttps://contribute.imdb.com/updates?update=
in order to identify the prompt instead of an actual description in order to return the $sNotFound text...Please also notice that from what I suspect (though I can't confirm as I can't find any series without a description), so I simply take an educated guess based on the patterns that look similar, the same bug may be affecting
IMDB_SERIES_DESC
, if someone stumbles upon a series that lacks a description...