google-cloudsearch / norconex-committer-plugin

Google Cloud Search Norconex HTTP Collector Indexer Plugin
Apache License 2.0
5 stars 7 forks source link

Getting error while committing docs to google cloud search datastore. Below is the error example and this is happening all of a sudden #19

Open sudeshna-majumder opened 3 years ago

sudeshna-majumder commented 3 years ago

INFO [HttpCrawler] 2 start URLs identified. INFO [CrawlerEventManager] CRAWLER_STARTED INFO [AbstractCrawler] bayer-default: Crawling references... INFO [CrawlerEventManager] REJECTED_REDIRECTED: https://www.bayer.com.tw/ INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/zh-hant/ INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/zh-hant/ INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-09/hr.jpg?h=341981b4&itok=cjW2HXv9 INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-09/teaser-nav-commit.jpg?h=fd24c189&itok=_2ttr7tf INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/16_9_aspect_ratio/public/2020-11/movingimages05.jpg?h=d19103a9&itok=8abowRlj INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/zh-hant/bayer-innovation INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/16_9_aspect_ratio/public/2020-11/movingimages02.jpg?h=bf3ccb75&itok=-iKRgKQx INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/zh-hant/bayer-innovation INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/16_9_small/public/2020-11/Receptionist%20talking%20phone_426.jpg?h=656682cd&itok=oVEc4lU6 INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/16_9_aspect_ratio/public/2020-11/movingimages01.jpg?h=d19103a9&itok=PoKOHP28 INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/16_9_small/public/2020-08/consumer-health.jpg?h=c397aecc&itok=WrOpXWdU INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-08/taiwan.png?h=fd24c189&itok=Wf0Gpu5H INFO [CrawlerEventManager] URLS_EXTRACTED: https://www.bayer.com.tw/zh-hant/conditions-of-use INFO [CrawlerEventManager] DOCUMENT_IMPORTED: https://www.bayer.com.tw/zh-hant/conditions-of-use INFO [CrawlerEventManager] DOCUMENT_IMPORTED: https://www.bayer.com.tw/zh-hant/bayer-innovation INFO [CrawlerEventManager] DOCUMENT_IMPORTED: https://www.bayer.com.tw/zh-hant/ INFO [CrawlerEventManager] DOCUMENT_COMMITTED_ADD: https://www.bayer.com.tw/zh-hant/bayer-innovation (GoogleCloudSearchCommitter[queueSize=100,docCount=62872,queue=FileSystemCommitter[directory=../workdir/queue],commitBatchSize=10,maxRetries=0,maxRetryWait=0,operations=[],targetReferenceField=,sourceReferenceField=,keepSourceReferenceField=false,targetContentField=,sourceContentField=,keepSourceContentField=false]) INFO [CrawlerEventManager] DOCUMENT_COMMITTED_ADD: https://www.bayer.com.tw/zh-hant/ (GoogleCloudSearchCommitter[queueSize=100,docCount=62872,queue=FileSystemCommitter[directory=../workdir/queue],commitBatchSize=10,maxRetries=0,maxRetryWait=0,operations=[],targetReferenceField=,sourceReferenceField=,keepSourceReferenceField=false,targetContentField=,sourceContentField=,keepSourceContentField=false]) INFO [CrawlerEventManager] DOCUMENT_COMMITTED_ADD: https://www.bayer.com.tw/zh-hant/conditions-of-use (GoogleCloudSearchCommitter[queueSize=100,docCount=62873,queue=FileSystemCommitter[directory=../workdir/queue],commitBatchSize=10,maxRetries=0,maxRetryWait=0,operations=[],targetReferenceField=,sourceReferenceField=,keepSourceReferenceField=false,targetContentField=,sourceContentField=,keepSourceContentField=false]) INFO [CrawlerEventManager] REJECTED_REDIRECTED: https://www.bayer.com.tw/node/ INFO [CrawlerEventManager] REJECTED_REDIRECTED: https://www.bayer.com.tw/rss INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/sites/bayer_com_tw/files/bayer-organizational-structure-2020-08-21.pdf INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/sites/bayer_com_tw/files/bayer-organizational-structure-2020-08-21.pdf INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/themes/custom/bayer_cpa/logo.svg INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/themes/custom/bayer_cpa/logo.svg INFO [CrawlerEventManager] REJECTED_IMPORT: https://www.bayer.com.tw/themes/custom/bayer_cpa/logo.svg INFO [CrawlerEventManager] REJECTED_REDIRECTED: https://www.bayer.com.tw/en/node/2 INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/zh-hant/advanced-search INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/zh-hant/advanced-search INFO [CrawlerEventManager] URLS_EXTRACTED: https://www.bayer.com.tw/en/node/56 INFO [CrawlerEventManager] DOCUMENT_IMPORTED: https://www.bayer.com.tw/en/node/556 Dec 09, 2020 10:50:11 PM com.google.enterprise.cloudsearch.sdk.indexing.IndexingServiceImpl getSchema WARNING: Schema lookup failed. Using empty schema javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alert.createSSLException(Alert.java:131) at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) at sun.security.ssl.TransportContext.fatal(TransportContext.java:267) at sun.security.ssl.TransportContext.fatal(TransportContext.java:262) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369) at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422) at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) at sun.security.ssl.SSLTransport.decode(SSLTransport.java:149) at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1143) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1054) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:394) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1340) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1315) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264) at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at com.google.api.client.auth.oauth2.TokenRequest.executeUnparsed(TokenRequest.java:283) at com.google.api.client.auth.oauth2.TokenRequest.execute(TokenRequest.java:307) at com.google.api.client.googleapis.auth.oauth2.GoogleCredential.executeRefreshToken(GoogleCredential.java:394) at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489) at com.google.api.client.auth.oauth2.Credential.intercept(Credential.java:217) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:868) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549) at com.google.enterprise.cloudsearch.sdk.BaseApiService.executeRequest(BaseApiService.java:429) at com.google.enterprise.cloudsearch.sdk.indexing.IndexingServiceImpl.getSchema(IndexingServiceImpl.java:1143) at com.google.enterprise.cloudsearch.sdk.indexing.StructuredData.initFromConfiguration(StructuredData.java:199) at com.norconex.committer.googlecloudsearch.GoogleCloudSearchCommitter.init(GoogleCloudSearchCommitter.java:204) at com.norconex.committer.googlecloudsearch.GoogleCloudSearchCommitter.commitBatch(GoogleCloudSearchCommitter.java:234) at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179) at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208) at com.norconex.committer.core.AbstractBatchCommitter.commitDeletion(AbstractBatchCommitter.java:148) at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:225) at com.norconex.committer.core.AbstractCommitter.commitIfReady(AbstractCommitter.java:146) at com.norconex.committer.core.AbstractCommitter.add(AbstractCommitter.java:97) at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:34) at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:27) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:380) at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:600) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:541) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:829) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:456) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323) at sun.security.validator.Validator.validate(Validator.java:271) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:315) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:223) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638) ... 48 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451) ... 54 more

INFO [GoogleCloudSearchCommitter] Indexing Service reference count: 1 INFO [GoogleCloudSearchCommitter] Sending 10 documents to Google Cloud Search for addition/deletion. INFO [CrawlerEventManager] DOCUMENT_COMMITTED_ADD: https://www.bayer.com.tw/en/node/556 (GoogleCloudSearchCommitter[queueSize=100,docCount=62911,queue=FileSystemCommitter[directory=../workdir/queue],commitBatchSize=10,maxRetries=0,maxRetryWait=0,operations=[],targetReferenceField=,sourceReferenceField=,keepSourceReferenceField=false,targetContentField=,sourceContentField=,keepSourceContentField=false]) INFO [GoogleCloudSearchCommitter] Document deleted (38ms): https://www.cropscience.bayer.ca/en/Products/Fungicides/Prosaro-west/Quality INFO [GoogleCloudSearchCommitter] Document deleted (0ms): https://www.cropscience.bayer.ca/en/Products/Fungicides/Prosaro-west/Quantity INFO [GoogleCloudSearchCommitter] Document deleted (0ms): https://www.cropscience.bayer.ca/en/Products/Fungicides/Scala INFO [GoogleCloudSearchCommitter] Indexing Service release reference count: 1 INFO [GoogleCloudSearchCommitter] Stopping indexingService: 0 INFO [CrawlerEventManager] DOCUMENT_FETCHED: https://www.bayer.com.tw/en/node/571 Dec 09, 2020 10:50:11 PM com.google.enterprise.cloudsearch.sdk.BatchRequestService shutDown INFO: Shutting down batching service. flush on shutdown: true INFO [CrawlerEventManager] CREATED_ROBOTS_META: https://www.bayer.com.tw/en/node/571 INFO [CrawlerEventManager] DOCUMENT_IMPORTED: https://www.bayer.com.tw/en/node/56 INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-09/duty_170x100.jpg?h=88f562ca&itok=8eEavwXI INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/2020-08/hospital-science-01.jpg INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-08/taiwan.png?h=fd24c189&itok=Wf0Gpu5H INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/inline-images/hospital-science-02.png INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-11/Newspaper_production.jpg?h=78276bf5&itok=5SH9XPoW INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-09/teaser-nav-commit.jpg?h=fd24c189&itok=_2ttr7tf INFO [CrawlerEventManager] REJECTED_FILTER: https://www.bayer.com.tw/sites/bayer_com_tw/files/styles/280x160/public/2020-09/teaser-nav-news.jpg?h=a0a0c8ec&itok=_QhViWhe Dec 09, 2020 10:50:12 PM com.google.enterprise.cloudsearch.sdk.BatchRequestService$SnapshotRunnable getGoogleJsonError WARNING: Retrying request failed with exception: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.ssl.Alert.createSSLException(Alert.java:131) at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) at sun.security.ssl.TransportContext.fatal(TransportContext.java:267) at sun.security.ssl.TransportContext.fatal(TransportContext.java:262) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369) at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444) at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422) at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) at sun.security.ssl.SSLTransport.decode(SSLTransport.java:149) at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1143) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1054) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:394) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1340) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1315) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:264) at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at com.google.api.client.auth.oauth2.TokenRequest.executeUnparsed(TokenRequest.java:283) at com.google.api.client.auth.oauth2.TokenRequest.execute(TokenRequest.java:307) at com.google.api.client.googleapis.auth.oauth2.GoogleCredential.executeRefreshToken(GoogleCredential.java:394) at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489) at com.google.api.client.auth.oauth2.Credential.intercept(Credential.java:217) at com.google.api.client.googleapis.batch.BatchRequest$BatchInterceptor.intercept(BatchRequest.java:300) at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:868) at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241) at com.google.enterprise.cloudsearch.sdk.BatchRequestService$BatchRequestHelper.executeBatchRequest(BatchRequestService.java:447) at com.google.enterprise.cloudsearch.sdk.BatchRequestService$SnapshotRunnable.execute(BatchRequestService.java:308) at com.google.enterprise.cloudsearch.sdk.BatchRequestService$SnapshotRunnable.run(BatchRequestService.java:238) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:456) at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323) at sun.security.validator.Validator.validate(Validator.java:271) at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:315) at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:223) at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129) at sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:638) ... 31 more Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451) INFO [GoogleCloudSearchCommitter] Indexing Service release reference count: 1 INFO [GoogleCloudSearchCommitter] Stopping indexingService: 0 Dec 09, 2020 10:53:58 PM com.google.enterprise.cloudsearch.sdk.BatchRequestService shutDown INFO: Shutting down batching service. flush on shutdown: true INFO [GoogleCloudSearchCommitter] Shutting down (took: 2ms)! INFO [GoogleCloudSearchCommitter] Indexing Service reference count: 0 INFO [AbstractCrawler] bayer-default: Crawler executed in 8 minutes 2 seconds. INFO [SitemapStore] bayer-default: Closing sitemap store... ERROR [JobSuite] Execution failed for job: bayer-default INFO [JobSuite] Running bayer-default: END (Wed Dec 09 22:45:55 UTC 2020

I have checked for jre keystore, no certificate has expired recenly. Also updated my SDK to latest version. But nothing worked. I am getting this error irrespective of domains I am trying to index.