TeamHG-Memex / undercrawler

A generic crawler
78 stars 25 forks source link

EvalError: Refused to evaluate a string as JavaScript #62

Open lopuhin opened 7 years ago

lopuhin commented 7 years ago

Requests may fail completely due to failure to execute javascript headless horsemen scripts:

[scrapy_splash.middleware] WARNING: Bad request to Splash: {'description': 'Error happened while executing Lua script', 'error': 400, 'info': {'line_number': 70, 'error': 'JavaScript error: EvalError: Refused to evaluate a string as JavaScript because \'unsafe-eval\' is not an allowed source of script in the following Content Security Policy directive: "script-src \'self\' https://*.twimg.com https://*.twitter.com https://static.ads-twitter.com".', 'message': 'Lua error: [string "function get_arg(arg, default)..."]:70: JavaScript error: EvalError: Refused to evaluate a string as JavaScript because \'unsafe-eval\' is not an allowed source of script in the following Content Security Policy directive: "script-src \'self\' https://*.twimg.com https://*.twitter.com https://static.ads-twitter.com".\n', 'source': '[string "function get_arg(arg, default)..."]', 'type': 'LUA_ERROR'}, 'type': 'ScriptError'}

Line 70 is this one: https://github.com/TeamHG-Memex/undercrawler/blob/710b7f50544cdc1a4572c120ebe8d857b7d0042f/undercrawler/directives/headless_horseman.lua#L70

It's possible to skip the error with pcall: https://www.lua.org/pil/8.4.html, but maybe there is a way to still execute js on the page?

kmike commented 7 years ago

AFAIK the only way is to strip CSP headers using a proxy in front of Splash. See https://github.com/scrapinghub/splash/issues/313.

lopuhin commented 7 years ago

Thanks @kmike , it's good to know that this is a known issue :)

nehakansal commented 5 years ago

I was looking at the code and trying to figure out where would I add pcall, a bit confused there, could you point me towards that please. I would like to skip this error and continue with the rendering without the js script. Would it have to be in the Splash code (where the lua script is getting called)? Thanks.

lopuhin commented 5 years ago

@nehakansal I think pcall is a way to handle exceptions in Lua, similar to try/except in Python ( https://www.lua.org/pil/8.4.html), so it should be placed where the exception gets thrown - I didn't try doing it though. Using a proxy which strips CSP headers looks like a better solution, but it's probably more work to set it up.

nehakansal commented 5 years ago

Thanks. I tried that earlier but it didnt work. I will try further. For now, I would prefer to make it work by handling the exception, than to strip the headers.