Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

Getting SSLHandshakeException while commiting documents to Google Cloud Search #727

Closed sudeshna-majumder closed 3 years ago

sudeshna-majumder commented 3 years ago

I am getting this error all of a sudden in my perfectly fine running application. Is this because of java version ?

Nov 18, 2020 8:36:51 AM com.google.enterprise.cloudsearch.sdk.indexing.IndexingServiceImpl getSchema
WARNING: Schema lookup failed. Using empty schema
javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alert.createSSLException(Alert.java:131)
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
.
.
.
.
.
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
WARN  [GoogleCloudSearchCommitter] Exception caught while indexing: https://www.bayer.com.tw/en/node/251
java.lang.IllegalArgumentException: invalid object type page
essiembre commented 3 years ago

From what I can see, the error comes from the GoogleCloudSearchCommitter when you are indexing the URL mentioned in your logs. Not sure why GoogleCloud would reject it. Looks like an SSL certificate issue.

Please open a ticket with Google at https://github.com/google-cloudsearch/norconex-committer-plugin/issues

If you discover the issue is related to the HTTP Collector itself, feel free to re-open or create a new issue.