Closed HappyCustomers closed 5 years ago
That error is a generic one when PhantomJS failed to execute properly. Have you tried with and without setting the proxy solution described in the documentation?
Also, make sure you have the log level to DEBUG and see if you get more insights.
Finally, check the logs for the command that is executed on the filesystem and try to run it manually to confirm whether you get content (you may have to modify the arguments).
Let me know the outcome of trying the above.
Thanks Pascal for the quick response. I will set the log to DEBUG and check, However one question is that there is no error in version 2.8.0 and in version 2.8.1 I get REJECTED_BAD_STATUS error There is no Proxy setting as such.
There were a few fixes to PhantomJSDocumentFetcher in 2.8.1. That is probably why you see a difference. Before maybe it was failing silently whereas now it shows an error.
OK, In DEBUG mode I am getting the following error [DEBUG] Network - Resource request error: QNetworkReply::NetworkError(OperationCanceledError) ( "Operation canceled" ) URL:
Those could be difficult to troubleshoot. See if you can successfully download your page with using PhantomJS from the command-line. It may be easier to troubleshoot PhantomJS issues.
You can see other people have the same problem with PhantomJS:
https://github.com/ariya/phantomjs/issues/13806 https://github.com/ariya/phantomjs/issues/12750#issuecomment-281364082
It seems to be a PhantomJS bug. The second link points to a hacky solution. Not sure it would work for you.
Unfortunately, PhantomJS is no longer supported by its author. So if you did find a bug with it, there might not be a good solution. Version 3 of HTTP Collector will support working transparently with every major browser for dynamic content (Chrome, Firefox, Edge, etc.). We are hoping it will make things easier. Before you ask though... there is no release date for it yet. ;-)
Sorry , one more question how to disable proxy in PhantomJS settings? The log says [DEBUG] Set "http" proxy to: "" : 1080 [DEBUG] 9 proxyType : "http" [DEBUG] 10 proxy : ":1080" [DEBUG] 11 proxyAuth : ":"
This is the error
DEBUG - Unsupported HTTP Response: null INFO - REJECTED_BAD_STATUS:
I have not set any proxy in configuration
You mean the proxy set by HTTP Collector? It is optional and by default, no proxy is applied. Same when using on the command-line I believe: it has to be set explicitly.
Pascal,
I have not set up any proxy in HTTP Collector, but PhantomJS is sill showing the proxy settings in the Log as above.
Not sure where those come from. Can you share your full log file, in case more context may help?
sorry for the delay in responding. I have sent you the email with log and config file
I was able to reproduce but could not find a solution. The "Operation canceled" error is pretty common amongst PhantomJS users but very few are able to fix it. Maybe you can find a suggestion online that works for you. If not, given PhantomJS development has stalled, I am afraid you will have to wait for version 3 of HTTP Collector. Or, maybe look at implementing your own IHttpDocumentFetcher
that wraps a headless browser (or else) if possible for you.
Hi Pascal,
Ref : Norconex/importer: Issue No Import only certain text from HTML file #87 (https://github.com/Norconex/importer/issues/87 )
Based on your advice on using PhantomJS for fetching dynamic data I tried implementing the same.
However in version 2.8.0 I am not getting any error and it is the not fetching the dynamic data.
In version 2.8.1 I am getting the following error
REJECTED_BAD_STATUS: https://xyz.com (HttpFetchResponse [crawlState=BAD_STATUS, statusCode=-1, reasonPhrase=null])
I have sent the config file by email
Thank you