Open caoyang1211 opened 2 years ago
Do you still have the issue? I do not think there is a way to ignore certificates for now. I think your best bet would be to give direct access from your crawler server to your OpenSearch instance (shared VPC, IP whitelisting, etc.).
I created an OpenSearch domain on AWS inside a VPC. To read from or write data to the domain from my laptop, I have to run SecureCRT, create a session for a Bastion server that is running on AWS and has access to the VPC, and set up port forwarding to redirect traffic https://127.0.0.1:60443 to the OpenSearch domain https://vpc-*****-us-east-1-1-1-yzdjblkpyhbyaytxcnbzrxavpm.us-east-1.es.amazonaws.com.
I verified that port forwarding was working by running a curl command to index multiple JSON files into that OpenSearch domain with a --insecure option to ignore certificate checks. The command is like this:
curl -H "Content-Type:application/json" --insecure -XPOST "https://127.0.0.1:60443/_bulk" --data-binary @TutorialVideoDbRecords.json"
To use the Norconex crawler to index web pages to the OpenSearch domain, I sethttps://localhost:60443 in the Norconex config file and ran the crawler, it reported "Failure occured on node: "null" and "Host name 'localhost' does not match the certificate subject provided by the peer (CN=*.us-east-1.es.amazonaws.com)"
So it looks like the problem is caused by a certificate validation failure. Is there an option in the config file that can ignore certificate checks like the --insecure option in the curl command? My configuration is as following, no user credential is required to access the OpenSearch domain inside the VPC :
The error message I got after running the crawler is as following: 0:42:22.020 [tutorial video#1] ERROR ElasticsearchCommitter - Failure occured on node: "null". Check node logs. 10:42:22.022 [tutorial video#1] ERROR COMMITTER_BATCH_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=****,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.,restrictions=[],request=]
10:42:22.127 [tutorial video#2] INFO DOCUMENT_COMMITTED_UPSERT - https://kidshealth.org/en/teens/center/concussions-ctr.html - Committers: ElasticsearchCommitter
10:42:22.129 [tutorial video#1] ERROR COMMITTER_UPSERT_ERROR - CommitterEvent[connectionTimeout=1000,credentials=Credentials[username=,password=****,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings=com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error,restrictions=[],request=UpsertRequest[reference=https://kidshealth.org/en/parents/pilonidal_gips_animation.html]]
10:42:22.129 [tutorial video#1] ERROR CrawlerCommitterService - Could not execute "upsert" on committer: ElasticsearchCommitter[connectionTimeout=1000,credentials=Credentials[username=,password=****,passwordKey=],discoverNodes=false,dotReplacement=,fixBadIds=false,ignoreResponseErrors=false,indexName=tutorials_videos,jsonFieldsPattern=,socketTimeout=30000,sourceIdField=,targetContentField=content,typeName=,queue=FSQueue[batchSize=20,commitLeftoversOnInit=false,ignoreErrors=false,maxPerFolder=500,retrier=Retrier[exceptionFilter=,maxCauses=10,maxRetries=0,retryDelay=0],splitBatch=OFF],committerContext=CommitterContext[eventManager=com.norconex.commons.lang.event.EventManager@6a5e167a,streamFactory=com.norconex.commons.lang.io.CachedStreamFactory@60e06f7d,workDir=.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0],fieldMappings={},restrictions=[]]
com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not process one or more files form committer batch located at C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\queue\batch-1656600137136000000. Moved them to error directory: C:\Users\pantr\eclipse-workspace\medlineplus-crawler-http.\tutorial_video\MP_32_Collector\tutorial_32_video\committer\0\error
at com.norconex.committer.core3.batch.queue.impl.FSQueue.moveUnrecoverableBatchError(FSQueue.java:429) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:364) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeBatchDirectory(FSQueue.java:338) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.queue(FSQueue.java:331) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.doUpsert(AbstractBatchCommitter.java:87) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.AbstractCommitter.upsert(AbstractCommitter.java:215) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.lambda$upsert$1(CrawlerCommitterService.java:84) ~[norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.executeAll(CrawlerCommitterService.java:129) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.CrawlerCommitterService.upsert(CrawlerCommitterService.java:80) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:30) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:24) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) [norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:388) [norconex-collector-http-3.0.0.jar:3.0.0]
at com.norconex.collector.core.crawler.Crawler.processImportResponse(Crawler.java:681) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextQueuedCrawlData(Crawler.java:614) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler.processNextReference(Crawler.java:556) [norconex-collector-core-2.0.0.jar:2.0.0]
at com.norconex.collector.core.crawler.Crawler$ProcessReferencesRunnable.run(Crawler.java:923) [norconex-collector-core-2.0.0.jar:2.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: com.norconex.committer.core3.batch.queue.CommitterQueueException: Could not consume batch. Number of attempts: 1
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:407) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.commons.lang.exec.RetriableException: Execution failed, maximum number of retries reached.
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:204) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: com.norconex.committer.core3.CommitterException: Could not commit JSON batch to Elasticsearch.
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:543) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: java.io.IOException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=.us-east-1.es.amazonaws.com)
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:901) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:276) ~[elasticsearch-rest-client-7.16.2.jar:7.16.2]
at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:537) ~[norconex-committer-elasticsearch-5.0.0.jar:5.0.0]
at com.norconex.committer.core3.batch.AbstractBatchCommitter.consume(AbstractBatchCommitter.java:112) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.lambda$consumeRetriableBatch$1(FSQueue.java:398) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.commons.lang.exec.Retrier.execute(Retrier.java:177) ~[norconex-commons-lang-2.0.0.jar:2.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeRetriableBatch(FSQueue.java:395) ~[norconex-committer-core-3.0.0.jar:3.0.0]
at com.norconex.committer.core3.batch.queue.impl.FSQueue.consumeSplitableBatchDirectory(FSQueue.java:356) ~[norconex-committer-core-3.0.0.jar:3.0.0]
... 18 more
Caused by: javax.net.ssl.SSLPeerUnverifiedException: Host name 'localhost' does not match the certificate subject provided by the peer (CN=.us-east-1.es.amazonaws.com)
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy.verifySession(SSLIOSessionStrategy.java:209) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.conn.ssl.SSLIOSessionStrategy$1.verify(SSLIOSessionStrategy.java:188) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(SSLIOSession.java:360) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(SSLIOSession.java:523) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:120) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.12.jar:4.4.12]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.12.jar:4.4.12]
... 1 more
10:42:22.149 [tutorial video#1] INFO Crawler - Could not process document: https://kidshealth.org/en/parents/pilonidal_gips_animation.html (Could not execute "upsert" on 1 committer(s): "ElasticsearchCommitter". Check the logs for more details.)