Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

PhantomJS script file doesn not exist or is not a valid file: Project directory\scripts\phantom.js #28

Closed nancygoyal1 closed 4 years ago

nancygoyal1 commented 4 years ago

I am new to GCS and have basic coding knowledge. I want to try to crawl dynamic pages using PhantomJSDocumentFetcher with the below configurations-

<documentFetcher  
                    class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher"
                    detectContentType="true" detectCharset="true">
                    <exePath>C:\Users\phantomjs.exe</exePath>
                    <renderWaitTime>
                          3000
                    </renderWaitTime>     
                    <referencePattern></referencePattern>  
<contentTypePattern></contentTypePattern>
                    <validStatusCodes>200</validStatusCodes>
                    <notFoundStatusCodes>404</notFoundStatusCodes>
              </documentFetcher> 

I have downloaded phantom.exe and is using that file. On running the application, i am getting the error PhantomJS script file doesn not exist or is not a valid file: Project directory\scripts\phantom.js. I am not sure if i need to install Phantomjs or i am missing something?

essiembre commented 4 years ago

Under your HTTP Collector installation directory, you will find a script folder. In it, there should be a phantom.js file. Add to your document fetcher the following (with the valid path):

      <scriptPath>/whatever/path/to/scripts/phantom.js</scriptPath>