Closed happy-river closed 4 years ago
@ferocious-ferret suggested I change this to only download the first 10k, and look in that for the title. I'll see what I can do about that today.
I'd leave it around ~100-500k, I've seen many pages that dump huge inline css/js blocks in the head before the title
I changed it to read only the first 200K, and to truncate the HTML before parsing. I tried it with the limit at 100K and it couldn't get youtube titles.
Do some cheap tests to determine if a fetched page contains HTML and has a title before parsing it to look for the title. This change makes
grab_title
reach the conclusion that a 4MB jpg doesn't have a title much quicker than before. I also cut the maximum size page that would be downloaded to check for a title down to ~2M from ~25M. I'm not sure what the right number is for that. 500K is too small and 25M seems way too large.