Closed jacksonp2008 closed 3 years ago
Hello @jacksonp2008,
You are correct that "typeName" is no longer supported, in line with Elasticsearch's evolving API.
Which version of Elasticsearch are you using? Support for type has been deprecated in Elasticsearch 6.x and effectively removed starting from Elasticsearch 7.x.
You may consider upgrading your version of Elasticsearch.
You can find the Committer version 5 documentation from the JavaDoc:.
Version 6.3 via AWS. (6.8 is the latest they offer) Can I use an older committer with 3.0.0?
Which zone are you in? Last time I checked, AWS Elasticsearch service was offering up to 7.9, as described here.
Unfortunately, older committers are not compatible with 3.0.0.
We can make it a feature request to support version 6.x of Elasticsearch but you can likely upgrade Elasticsearch faster.
Unfortunately I can't easily upgrade beyond 6 as there are a lot of tools using ES right now and we would have to do a lot of testing. I may have to find another way using some of your previous recommendations from https://github.com/Norconex/collector-http/issues/739
I'll try the Phantomjs approach next. Thank-you Pacal
I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName
for backward compatibility. If you want to go back to trying popular browsers for crawling, please try this snapshot release and confirm if it works for you.
Amazing! Thank-you Pascal, will give it a try this week.
Regards,
-Steve
(415) 320-1102 https://www.google.com/voice/#phones
On Sun, Mar 21, 2021 at 10:28 PM Pascal Essiembre @.***> wrote:
I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility. If you want to go back to trying popular browsers for crawling, please try this snapshot release and confirm if it works for you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Norconex/committer-elasticsearch/issues/41#issuecomment-803771675, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN2WTZFM5NT7HLDF4RR2MTTE3IPLANCNFSM4ZSM32OQ .
Alright I am trying this and it runs, but there are some issues I see:
Here is the current config for completeness.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">
<!-- Decide where to store generated files. -->
<workDir>./forescout/docs/docs-output</workDir>
<crawlers>
<!-- you can have multiple crawlers -->
<crawler id="FS-docs-Crawler">
<startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
<url>https://docs.forescout.com</url>
</startURLs>
<robotsTxt ignore="true"/>
<!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
<maxDepth>24</maxDepth>
<sitemapResolver ignore="false"/>
<!-- Be as nice as you can to sites you crawl. -->
<delay default="500"/>
<!-- Document Filtering -->
<documentFilters>
<filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
jpg,jpeg,gif,png
</filter>
</documentFilters>
<!-- Document importing -->
<importer>
<preParseHandlers>
<!-- Pre parsing taggers can go here -->
<!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
<handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>
</preParseHandlers>
<postParseHandlers>
<!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<restrictTo caseSensitive="false"
field="title">
</restrictTo>
<rename fromField="title" toField="fs_title" overwrite="true" />
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<restrictTo caseSensitive="false"
field="document.reference">
</restrictTo>
<rename fromField="document.reference" toField="fs_reference" overwrite="true" />
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
field="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />
<handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
<constant name="search_title">Docs Portal</constant>
</handler>
<!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need
<handler class="KeepOnlyTagger">
<fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher>
</handler>
-->
</postParseHandlers>
</importer>
<!-- Decide what to do with your files by specifying a Committer. -->
<committers>
<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<!-- elastic dev site -->
<nodes>https://search-sesasdfsafsfdsadfsafasfsafasfsafsdf1.es.amazonaws.com:443</nodes>
<indexName>docs</indexName>
<typeName>docs</typeName>
<targetContentField>fs_content</targetContentField>
<fixBadIds>true</fixBadIds>
</committer>
</committers>
</crawler>
</crawlers>
</httpcollector>
Here is a record as shown in Kibana
{ "_index": "docs", "_type": "docs", "_id": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "_version": 1, "_score": null, "_source": { "fs_content": "\n \n\n \n", "document.contentFamily": "html", "Server": "cloudflare", "collector.sitemap-changefreq": "daily", "Content-Location": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "document.reference": "https://docs.forescout.com/bundle/CIUP-3-0-6-rn/page/CIUP-3-0-6-rn.Install-the-Forescout-Infrastructure-Update-Pack.html", "X-Frame-Options": "DENY", "Referrer-Policy": "no-referrer-when-downgrade", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "Content-Security-Policy": "frame-ancestors 'self'", "collector.is-crawl-new": "true", "Content-Encoding": "UTF-8", "collector.http-fetcher": "com.norconex.collector.http.fetch.impl.GenericHttpFetcher", "collector.depth": "0", "X-XSS-Protection": "1; mode=block", "Content-Length": "6434", "Content-Type": "text/html; charset=UTF-8", "cf-request-id": "08fdad0fc90000cf686b3a0000000001", "Transfer-Encoding": "chunked", "X-Parsed-By": [ "org.apache.tika.parser.DefaultParser", "org.apache.tika.parser.html.HtmlParser" ], "collector.sitemap-priority": "0.5", "CF-RAY": "6342e45fa8ebcf68-IAD", "X-Content-Type-Options": "nosniff", "Connection": "keep-alive", "collector.sitemap-lastmod": "2020-12-12T00:00Z", "document.contentEncoding": "UTF-8", "X-Content-Security-Policy": "frame-ancestors 'self'", "Date": "Mon, 22 Mar 2021 22:35:15 GMT", "X-WebKit-CSP": "frame-ancestors 'self'", "CF-Cache-Status": "DYNAMIC", "viewport": "width=device-width, initial-scale=1, shrink-to-fit=no", "document.contentType": "text/html", "Content-Language": "en", "Expect-CT": "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"" }, "fields": { "collector.sitemap-lastmod": [ "2020-12-12T00:00:00.000Z" ] }, "sort": [ 1607731200000 ] }
Hello @jacksonp2008, I am surprised it worked at all for you since you have XML configuration syntax errors. V3 is not a straight replacement for V2. There were some changes in the config. You should find exceptions telling you so when you try to launch, like:
...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: toField. Update your XML configuration accordingly.
...
Caused by: com.norconex.commons.lang.xml.XMLException: "field" attribute has been deprecated in favor of: fieldMatcher. Update your XML configuration accordingly.
...
Once adapted to V3, the affected handlers should look like this:
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<rename toField="fs_title" onSet="replace">
<fieldMatcher>title</fieldMatcher>
</rename>
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<rename toField="fs_reference" onSet="replace">
<fieldMatcher>document.reference</fieldMatcher>
</rename>
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger"
toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" />
<handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
<constant name="search_title">Docs Portal</constant>
</handler>
I was able to run your config after making these changes. I can confirm getting all the values you mention as expected.
title
has no values because there are no titles in all documents I have quickly checked. It is likely generated via JavaScript.
It seems you are ready for the next step, where you would try to crawl with a browser using (WebDriverHttpFetcher)
alright thanks again Pascal, will give this a try.
Something doesn't compute, I downloaded the chromedriver and added to the config per below. I tried the <fetcher under <httpcollector, then <crawler, then under <importer and it remains unhappy.
Made the handler changes you mentioned as well:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- This is for Version 3 to deal with the zoomin site -->
<httpcollector id="FS-docs-Collector">
<fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
<browser>chrome</browser>
<driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
<restrictions>
<restrictTo field="document.reference">
.*dynamic.*$
</restrictTo>
</restrictions>
</fetcher>
<!-- Decide where to store generated files. -->
<workDir>./forescout/docs/docs-output</workDir>
<crawlers>
<!-- you can have multiple crawlers -->
<crawler id="FS-docs-Crawler">
<startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
<url>https://docs.forescout.com</url>
</startURLs>
<robotsTxt ignore="true"/>
<!-- Put a maximum depth to avoid infinite crawling (e.g. calendars). -->
<maxDepth>24</maxDepth>
<sitemapResolver ignore="false"/>
<!-- Be as nice as you can to sites you crawl. -->
<delay default="500"/>
<!-- Document Filtering -->
<documentFilters>
<filter class="com.norconex.collector.core.filter.impl.ExtensionReferenceFilter" onMatch="exclude">
jpg,jpeg,gif,png
</filter>
</documentFilters>
<!-- Document importing -->
<importer>
<preParseHandlers>
<!-- Pre parsing taggers can go here -->
<!-- sample DebugTagger below <tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger" logFields="_id,id,content,title,keywords,description,document.reference" logLevel="INFO" /> -->
<handler class="com.norconex.importer.handler.tagger.impl.DebugTagger" logLevel="INFO"/>
</preParseHandlers>
<postParseHandlers>
<!-- Rename fields with a prefix for the search engine, the document can be renamed in the committer -->
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<rename toField="fs_title" onSet="replace">
<fieldMatcher>title</fieldMatcher>
</rename>
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.RenameTagger">
<rename toField="fs_reference" onSet="replace">
<fieldMatcher>document.reference</fieldMatcher>
</rename>
</handler>
<handler class="com.norconex.importer.handler.tagger.impl.CurrentDateTagger" toField="@timestamp" format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>
<handler class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
<constant name="search_title">Docs Portal</constant>
</handler>
<!-- If your target repository does not support arbitrary fields, make sure you only keep the fields you need <handler class="KeepOnlyTagger"> <fieldMatcher method="csv">title,keywords,description,document.reference</fieldMatcher> </handler> -->
</postParseHandlers>
</importer>
<!-- Decide what to do with your files by specifying a Committer. -->
<committers>
<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
<!-- elastic dev site -->
<nodes>https://search-sasdfasdfsafdsdfsfa.us-east-1.es.amazonaws.com:443</nodes>
<indexName>docs</indexName>
<typeName>docs</typeName>
<targetContentField>fs_content</targetContentField>
<fixBadIds>true</fixBadIds>
</committer>
</committers>
</crawler>
</crawlers>
</httpcollector>
ok, I found this: https://opensource.norconex.com/collectors/http/v3/apidocs/com/norconex/collector/http/crawler/HttpCrawlerConfig.html
and was able to get it to pass config test.
Now I am getting chrome driver issues. Seems to work when called directly? /home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 9515 Only local connections are allowed. Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe. ChromeDriver was started successfully.
Config is same:
<httpFetchers>
<fetcher class="com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher">
<browser>chrome</browser>
<driverPath>/home/spollock/norconex-collector-http-3.0.0-M1/drivers/chromedriver</driverPath>
</fetcher>
</httpFetchers>
But seeing these errors:
3:43:24.966 [FS-docs-Crawler/1] INFO CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:24.967 [FS-docs-Crawler/1] INFO Browser - Creating local "ChromeDriver" web driver.
13:43:24.975 [FS-docs-Crawler/2] INFO CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 13608
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.539 [FS-docs-Crawler/2] INFO Browser - Creating local "ChromeDriver" web driver.
13:43:25.541 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x563b2ec582b9 <unknown>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 12 more
13:43:25.551 [FS-docs-Crawler/1] INFO CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/1,5,main]
13:43:25.551 [FS-docs-Crawler/1] INFO WebDriverHttpFetcher - Shutting down CHROME web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 21025
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.599 [FS-docs-Crawler/2] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x564bc37f62b9 <unknown>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 12 more
13:43:25.602 [FS-docs-Crawler] INFO Crawler - Reprocessing any cached/orphan references...
13:43:25.609 [FS-docs-Crawler/2] INFO CRAWLER_RUN_THREAD_END - Thread[FS-docs-Crawler/2,5,main]
13:43:25.609 [FS-docs-Crawler/2] INFO WebDriverHttpFetcher - Shutting down CHROME web driver.
13:43:25.620 [FS-docs-Crawler] INFO Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 31545
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.653 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x55b1ecb7b2b9 <unknown>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 26 more
13:43:25.665 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap.xml. Expected status code 200, but got 0.
13:43:25.665 [FS-docs-Crawler] INFO Browser - Creating local "ChromeDriver" web driver.
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 11263
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.695 [FS-docs-Crawler] ERROR HttpFetchClient - Fetcher WebDriverHttpFetcher failed to execute request.
com.norconex.collector.core.CollectorException: Could not build web driver
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetchDocumentContent(WebDriverHttpFetcher.java:312) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetch(WebDriverHttpFetcher.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.HttpFetchClient.fetch(HttpFetchClient.java:102) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveLocation(GenericSitemapResolver.java:292) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.sitemap.impl.GenericSitemapResolver.resolveSitemaps(GenericSitemapResolver.java:227) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.HttpQueuePipeline$SitemapStage.executeStage(HttpQueuePipeline.java:104) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:31) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.pipeline.queue.AbstractQueueStage.execute(AbstractQueueStage.java:24) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.crawler.HttpCrawler.executeQueuePipeline(HttpCrawler.java:286) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.lambda$reprocessCacheOrphans$0(Crawler.java:476) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.store.impl.mvstore.MVStoreDataStore.forEach(MVStoreDataStore.java:118) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.doc.CrawlDocInfoService.forEachCached(CrawlDocInfoService.java:240) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.reprocessCacheOrphans(Crawler.java:475) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.handleOrphans(Crawler.java:448) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.doExecute(Crawler.java:413) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler.startExecution(Crawler.java:277) ~[norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.jef5.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.suite.JobSuite.runJob(JobSuite.java:519) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.job.group.AsyncJobGroup.runJob(AsyncJobGroup.java:135) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at com.norconex.jef5.job.group.AsyncJobGroup.lambda$executeGroup$0(AsyncJobGroup.java:104) ~[norconex-jef-5.0.0-M1.jar:5.0.0-M1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 26 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x5589d7f0c2b9 <unknown>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.openqa.selenium.remote.W3CHandshakeResponse.lambda$errorHandler$0(W3CHandshakeResponse.java:62) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HandshakeResponse.lambda$getResponseFunction$0(HandshakeResponse.java:30) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:126) ~[selenium-remote-driver-3.141.59.jar:?]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_282]
at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_282]
at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) ~[?:1.8.0_282]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_282]
at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:531) ~[?:1.8.0_282]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:128) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131) ~[selenium-remote-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) ~[selenium-chrome-driver-3.141.59.jar:?]
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:157) ~[selenium-chrome-driver-3.141.59.jar:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 26 more
13:43:25.705 [FS-docs-Crawler] ERROR GenericSitemapResolver - Could not obtain sitemap: https://docs.forescout.com/sitemap_index.xml. Expected status code 200, but got 0.
13:43:25.708 [FS-docs-Crawler/1] INFO CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/1,5,main]
13:43:25.708 [FS-docs-Crawler/1] INFO Browser - Creating local "ChromeDriver" web driver.
13:43:25.715 [FS-docs-Crawler/2] INFO CRAWLER_RUN_THREAD_BEGIN - Thread[FS-docs-Crawler/2,5,main]
Starting ChromeDriver 89.0.4389.23 (61b08ee2c50024bab004e48d2b1b083cdbdac579-refs/branch-heads/4389@{#294}) on port 4832
Only local connections are allowed.
Please see https://chromedriver.chromium.org/security-considerations for suggestions on keeping ChromeDriver safe.
ChromeDriver was started successfully.
13:43:25.763 [FS-docs-Crawler/2] INFO Browser - Creating local "ChromeDriver" web driver.
13:43:25.764 [FS-docs-Crawler/1] ERROR Crawler - Problem in thread execution.
com.norconex.collector.core.CollectorException: Could not build web driver
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:237) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverSupplier.get(Browser.java:181) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHolder.getDriver(WebDriverHolder.java:74) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.WebDriverHttpFetcher.fetcherThreadBegin(WebDriverHttpFetcher.java:242) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:127) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.collector.http.fetch.AbstractHttpFetcher.accept(AbstractHttpFetcher.java:76) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.event.EventManager.doFire(EventManager.java:144) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:125) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.commons.lang.event.EventManager.fire(EventManager.java:119) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:992) [norconex-collector-core-2.0.0-M1.jar:2.0.0-M1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_282]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_282]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_282]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_282]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:182) ~[commons-lang3-3.11.jar:3.11]
at org.apache.commons.lang3.reflect.ConstructorUtils.invokeExactConstructor(ConstructorUtils.java:149) ~[commons-lang3-3.11.jar:3.11]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.lambda$build$0(Browser.java:232) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
at com.norconex.commons.lang.SystemUtil.callWithProperty(SystemUtil.java:118) ~[norconex-commons-lang-2.0.0-M1.jar:2.0.0-M1]
at com.norconex.collector.http.fetch.impl.webdriver.Browser$WebDriverBuilder.build(Browser.java:222) ~[norconex-collector-http-3.0.0-M1.jar:3.0.0-M1]
... 12 more
Caused by: org.openqa.selenium.WebDriverException: unknown error: cannot find Chrome binary
Build info: version: '3.141.59', revision: 'e82be7d358', time: '2018-11-14T08:17:03'
System info: host: 'es-airflow', ip: '127.0.0.1', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1124-aws', java.version: '1.8.0_282'
Driver info: driver.version: ChromeDriver
remote stacktrace: #0 0x56465cf3d2b9 <unknown>
Since you are having issues with the WebDrivers of HTTP Collector, I have copied your last post to this new ticket: https://github.com/Norconex/collector-http/issues/746
The original issue being addressed ("typeName" missing for Elasticsearch Committer), I am closing this one.
Hi Pascal, were you able to make a snapshot for backward compatibility? https://github.com/Norconex/collector-http/issues/746
Like from above?
I created a new snapshot release of Elasticsearch Committer V5 (working with HTTP Collector V3 stack) that introduce back the typeName for backward compatibility.
This site is a problem for us, you said you may have gotten it to work for you at some point?
Yes, there is a snapshot of Elasticsearch Committer that supports adding "typeName" as per https://github.com/Norconex/committer-elasticsearch/issues/41#issuecomment-803771675.
Did you find something wrong with it or is your problem something else? In either case, please open a new ticket with more deatils (since this one has been closed).
Running the lastest 3.0.0 M1 with elasticsearch 5.0.0 m1
Per the doc, it seems like typename should be there: https://opensource.norconex.com/committers/elasticsearch/v4/configuration
But it may have changed with 5?
With this config,
If I remove "typeName", it errors when trying to commit to ES: