Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

ElasticSearch Committer Error #41

Closed jacksonp2008 closed 3 years ago

jacksonp2008 commented 3 years ago

Running the lastest 3.0.0 M1 with elasticsearch 5.0.0 m1

Per the doc, it seems like typename should be there: https://opensource.norconex.com/committers/elasticsearch/v4/configuration

But it may have changed with 5?

      <committers>
        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
          <nodes>https://searcdfhdhdhdhdhdhdhdhdhhdhdh1.es.amazonaws.com:443</nodes>
          <indexName>docs</indexName>
          <typeName>docs</typeName>
          <targetContentField>fs_content</targetContentField>
          <fixBadIds>true</fixBadIds>
        </committer>
      </committers>

With this config,

./collector-http.sh start -c forescout/docs/docs-config/docs-config.xml 

1 XML configuration errors detected:

[XML] StartCommand: cvc-complex-type.2.4.a: Invalid content was found starting with element 'typeName'. One of '{restrictTo, fieldMappings, queue, ignoreResponseErrors, discoverNodes, dotReplacement, credentials, jsonFieldsPattern, connectionTimeout, socketTimeout, fixBadIds, sourceIdField, targetContentField}' is expected.

If I remove "typeName", it errors when trying to commit to ES:

Caused by: org.elasticsearch.client.ResponseException: method [POST], host [https://search-seservices-qyt22kq34vaaadaz465jecxama.us-east-1.es.amazonaws.com:443], URI [/_bulk], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"action_request_validation_exception","reason":"Validation Failed: 1: type is missing;2: type is missing;3: type is missing;4: type is missing;5: type is missing;6: type is missing;7: type is missing;8: type is missing;9: type is missing;10: type is missing;11: type is missing;12: type is missing;13: type is missing;14: type is missing;15: type is missing;16: type is missing;17: type is missing;18: type is missing;19: type is missing;20: type is missing;"}],"type":"action_request_validation_exception","reason":"Validation Failed: 1: type is missing;2: type is missing;3: type is missing;4: type is missing;5: type is missing;6: type is missing;7: type is missing;8: type is missing;9: type is missing;10: type is missing;11: type is missing;12: type is missing;13: type is missing;14: type is missing;15: type is missing;16: type is missing;17: type is missing;18: type is missing;19: type is missing;20: type is missing;"},"status":400}
    at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.8.1.jar:7.8.1]
    at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:506) ~[norconex-committer-elasticsearch-5.0.0-M1.jar:5.0.0-M1]
    ... 23 more
essiembre commented 3 years ago

Hello @jacksonp2008,

You are correct that "typeName" is no longer supported, in line with Elasticsearch's evolving API.

Which version of Elasticsearch are you using? Support for type has been deprecated in Elasticsearch 6.x and effectively removed starting from Elasticsearch 7.x.

You may consider upgrading your version of Elasticsearch.

You can find the Committer version 5 documentation from the JavaDoc:.

jacksonp2008 commented 3 years ago

Version 6.3 via AWS. (6.8 is the latest they offer) Can I use an older committer with 3.0.0?

essiembre commented 3 years ago

Which zone are you in? Last time I checked, AWS Elasticsearch service was offering up to 7.9, as described here.

Unfortunately, older committers are not compatible with 3.0.0.

We can make it a feature request to support version 6.x of Elasticsearch but you can likely upgrade Elasticsearch faster.

jacksonp2008 commented 3 years ago

Unfortunately I can't easily upgrade beyond 6 as there are a lot of tools using ES right now and we would have to do a lot of testing. I may have to find another way using some of your previous recommendations from https://github.com/Norconex/collector-http/issues/739

I'll try the Phantomjs approach next. Thank-you Pacal

essiembre commented 3 years ago

I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility. If you want to go back to trying popular browsers for crawling, please try this snapshot release and confirm if it works for you.

jacksonp2008 commented 3 years ago

Amazing! Thank-you Pascal, will give it a try this week.

Regards,

-Steve

(415) 320-1102 https://www.google.com/voice/#phones

On Sun, Mar 21, 2021 at 10:28 PM Pascal Essiembre @.***> wrote:

I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility. If you want to go back to trying popular browsers for crawling, please try this snapshot release and confirm if it works for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Norconex/committer-elasticsearch/issues/41#issuecomment-803771675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN2WTZFM5NT7HLDF4RR2MTTE3IPLANCNFSM4ZSM32OQ .

jacksonp2008 commented 3 years ago

Alright I am trying this and it runs, but there are some issues I see:

  1. There doesn't appear to be a field which contains the "title" of the page
  2. rename tagger doesn't seem to be renaming (example document reference to fs_reference)
  3. CurrentDateTagger doesn't seem to be setting @timestamp
  4. ConstantTagger doesn't seem to be setting search_title
  5. The document content doesn't seem to show up anywhere, should be in fs_content

Here is the current config for completeness.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">

  <!-- Decide where to store generated files. -->
  <workDir>./forescout/docs/docs-output</workDir>

  <crawlers>
    <!-- you can have multiple crawlers -->
    <crawler id="FS-docs-Crawler">
      <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
        <url>https://docs.forescout.com</url>
      </startURLs>

      <robotsTxt ignore="true"/>

      <!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
      <maxDepth>24</maxDepth>

      <sitemapResolver ignore="false"/>

      <!-- Be as nice as you can to sites you crawl. -->
      <delay default="500"/>

      <!-- Document Filtering -->
      <documentFilters>
      <filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
        jpg,jpeg,gif,png
      </filter>
      </documentFilters>

      <!-- Document importing -->
      <importer>

        <preParseHandlers>
          <!-- Pre parsing taggers can go here -->
          <!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
          <handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>

        </preParseHandlers>

        <postParseHandlers>
          <!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <restrictTo caseSensitive="false"
                      field="title">
              </restrictTo>
              <rename fromField="title" toField="fs_title" overwrite="true" />
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <restrictTo caseSensitive="false"
                      field="document.reference">
              </restrictTo>
              <rename fromField="document.reference" toField="fs_reference" overwrite="true" />
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
            field="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

          <!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need
          <handler class="KeepOnlyTagger">
            <fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher>
          </handler>
        -->
        </postParseHandlers>
      </importer>

      <!-- Decide what to do with your files by specifying a Committer. -->
      <committers>
        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
          <!-- elastic dev site -->
          <nodes>https://search-sesasdfsafsfdsadfsafasfsafasfsafsdf1.es.amazonaws.com:443</nodes>
          <indexName>docs</indexName>
          <typeName>docs</typeName>
          <targetContentField>fs_content</targetContentField>
          <fixBadIds>true</fixBadIds>
        </committer>
      </committers>

    </crawler>
  </crawlers>
  </httpcollector>

Here is a record as shown in Kibana

{ "_index": "docs", "_type": "docs", "_id": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "_version": 1, "_score": null, "_source": { "fs_content": "\n \n\n \n", "document.contentFamily": "html", "Server": "cloudflare", "collector.sitemap-changefreq": "daily", "Content-Location": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "document.reference": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "X-Frame-Options": "DENY", "Referrer-Policy": "no-referrer-when-downgrade", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "Content-Security-Policy": "frame-ancestors 'self'", "collector.is-crawl-new": "true", "Content-Encoding": "UTF-8", "collector.http-fetcher": "com.norconex.collector.http.fetch.impl.GenericHttpFetcher", "collector.depth": "0", "X-XSS-Protection": "1; mode=block", "Content-Length": "6434", "Content-Type": "text/html; charset=UTF-8", "cf-request-id": "08fdad0fc90000cf686b3a0000000001", "Transfer-Encoding": "chunked", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.html.HtmlParser" ], "collector.sitemap-priority": "0.5", "CF-RAY": "6342e45fa8ebcf68-IAD", "X-Content-Type-Options": "nosniff", "Connection": "keep-alive", "collector.sitemap-lastmod": "2020-12-12T00:00Z", "document.contentEncoding": "UTF-8", "X-Content-Security-Policy": "frame-ancestors 'self'", "Date": "Mon, 22 Mar 2021 22:35:15 GMT", "X-WebKit-CSP": "frame-ancestors 'self'", "CF-Cache-Status": "DYNAMIC", "viewport": "width=device-width, initial-scale=1, shrink-to-fit=no", "document.contentType": "text/html", "Content-Language": "en", "Expect-CT": "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"" }, "fields": { "collector.sitemap-lastmod": [ "2020-12-12T00:00:00.000Z" ] }, "sort": [ 1607731200000 ] }

essiembre commented 3 years ago

Hello @jacksonp2008, I am surprised it worked at all for you since you have XML configuration syntax errors. V3 is not a straight replacement for V2. There were some changes in the config. You should find exceptions telling you so when you try to launch, like:

...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: toField. Update your XML configuration accordingly.
...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: fieldMatcher. Update your XML configuration accordingly.
...

Once adapted to V3, the affected handlers should look like this:

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <rename toField="fs_title" onSet="replace">
                <fieldMatcher>title</fieldMatcher>
              </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
              <rename toField="fs_reference" onSet="replace">
                <fieldMatcher>document.reference</fieldMatcher>
              </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
            toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

I was able to run your config after making these changes. I can confirm getting all the values you mention as expected.

title has no values because there are no titles in all documents I have quickly checked. It is likely generated via JavaScript.

It seems you are ready for the next step, where you would try to crawl with a browser using (WebDriverHttpFetcher)

jacksonp2008 commented 3 years ago

alright thanks again Pascal, will give this a try.

forescout-spollock commented 3 years ago

Something doesn't compute, I downloaded the chromedriver and added to the config per below. I tried the <fetcher under <httpcollector, then <crawler, then under <importer and it remains unhappy.

Made the handler changes you mentioned as well:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">

  <fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
    <browser>chrome</browser>
    <driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
    <restrictions>
      <restrictTo field="document.reference">
        .*dynamic.*$
      </restrictTo>
    </restrictions>
  </fetcher>

  <!-- Decide where to store generated files. -->
  <workDir>./forescout/docs/docs-output</workDir>

  <crawlers>
    <!-- you can have multiple crawlers -->
    <crawler id="FS-docs-Crawler">
      <startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
        <url>https://docs.forescout.com</url>
      </startURLs>

      <robotsTxt ignore="true"/>

      <!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
      <maxDepth>24</maxDepth>

      <sitemapResolver ignore="false"/>

      <!-- Be as nice as you can to sites you crawl. -->
      <delay default="500"/>

      <!-- Document Filtering -->
      <documentFilters>
        <filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
          jpg,jpeg,gif,png
        </filter>
      </documentFilters>

      <!-- Document importing -->
      <importer>

        <preParseHandlers>
          <!-- Pre parsing taggers can go here -->
          <!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
          <handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>

        </preParseHandlers>

        <postParseHandlers>
          <!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
            <rename toField="fs_title" onSet="replace">
              <fieldMatcher>title</fieldMatcher>
            </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
            <rename toField="fs_reference" onSet="replace">
              <fieldMatcher>document.reference</fieldMatcher>
            </rename>
          </handler>

          <handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger" toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>

          <handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="search_title">Docs Portal</constant>
          </handler>

          <!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need <handler class="KeepOnlyTagger"> <fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher> </handler> -->
        </postParseHandlers>
      </importer>

      <!-- Decide what to do with your files by specifying a Committer. -->
      <committers>
        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
          <!-- elastic dev site -->
          <nodes>https://search-sasdfasdfsafdsdfsfa.us-east-1.es.amazonaws.com:443</nodes>
          <indexName>docs</indexName>
          <typeName>docs</typeName>
          <targetContentField>fs_content</targetContentField>
          <fixBadIds>true</fixBadIds>
        </committer>
      </committers>

    </crawler>
  </crawlers>
</httpcollector>
forescout-spollock commented 3 years ago

ok, I found this: https://opensource.norconex.com/collectors/http/v3/apidocs/com/norconex/collector/http/crawler/HttpCrawlerConfig.html

and was able to get it to pass config test.

Now I am getting chrome driver issues. Seems to work when called directly? /home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 9515 Only local connections are allowed. Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe. ChromeDriver was started successfully.

Config is same:

      <httpFetchers>
    <fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
      <browser>chrome</browser>
      <driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
    </fetcher>
</httpFetchers>

But seeing these errors:

3:43:24.966 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:24.967 [FS-docs-Crawler/1] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:24.975 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 13608
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.539 [FS-docs-Crawler/2] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.541 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x563b2ec582b9 <unknown>

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
    at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
    at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 12 more
13:43:25.551 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/1,5,main]
13:43:25.551 [FS-docs-Crawler/1] INFO  WebDriverHttpFetcher - Shutting down CHROME web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 21025
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.599 [FS-docs-Crawler/2] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x564bc37f62b9 <unknown>

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
    at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
    at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 12 more
13:43:25.602 [FS-docs-Crawler] INFO  Crawler - Reprocessing any cached/orphan references...
13:43:25.609 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/2,5,main]
13:43:25.609 [FS-docs-Crawler/2] INFO  WebDriverHttpFetcher - Shutting down CHROME web driver.
13:43:25.620 [FS-docs-Crawler] INFO  Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 31545
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.653 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x55b1ecb7b2b9 <unknown>

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
    at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
    at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 26 more
13:43:25.665 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap.xml. Expected status code 200, but got 0.
13:43:25.665 [FS-docs-Crawler] INFO  Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 11263
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.695 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x5589d7f0c2b9 <unknown>

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
    at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
    at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
    at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
    at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 26 more
13:43:25.705 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap_index.xml. Expected status code 200, but got 0.
13:43:25.708 [FS-docs-Crawler/1] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:25.708 [FS-docs-Crawler/1] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.715 [FS-docs-Crawler/2] INFO  CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 4832
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.763 [FS-docs-Crawler/2] INFO  Browser - Creating local "ChromeDriver" web driver.
13:43:25.764 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
    at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
    at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
    ... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x56465cf3d2b9 <unknown>
essiembre commented 3 years ago

Since you are having issues with the WebDrivers of HTTP Collector, I have copied your last post to this new ticket: https://github.com/Norconex/collector-http/issues/746

The original issue being addressed ("typeName" missing for Elasticsearch Committer), I am closing this one.

jacksonp2008 commented 3 years ago

Hi Pascal, were you able to make a snapshot for backward compatibility? https://github.com/Norconex/collector-http/issues/746

Like from above?

I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility.

This site is a problem for us, you said you may have gotten it to work for you at some point?

essiembre commented 3 years ago

Yes, there is a snapshot of Elasticsearch Committer that supports adding "typeName" as per https://github.com/Norconex/committer-elasticsearch/issues/41#issuecomment-803771675.

Did you find something wrong with it or is your problem something else? In either case, please open a new ticket with more deatils (since this one has been closed).