Open MRC-westat opened 3 years ago
Which version of the Solr Committer are you using?
To establish whether the problem is specific to the Committer, can you confirm if you are able to successfully make queries and updates to Solr directly from the command line? Can you do so from the same host where the HTTP Collector runs? If not, the problem is likely misconfiguration on the Solr side.
If it works from the command line, it is harder to help out without a way to reproduce. Do you have more information about the error in your Solr logs? Check also for any kind of errors, especially around startup. You can also try upgrading the SolrJ library installed with the Committer to match your version of Solr.
Hi .. thanks for the assistance!
the version of the Solr committer is 2.4.0 Solr version is 8.8.2 I am using the latest Java 11 version from AdoptOpenJDK
I can do a query successfully using postman (passing in the username and password) and I can also do it successfully on the command line using curl curl --user solr_admin:password http://wessolrtest1:8983/solr/sops/select?q=test the collector/committer is running on the same stand alone Solr server.
not much information on Solr startup D:\Solr\bin>solr start -h wessolrtest1 "java version info is 11.0.10" "Extracted major version is 11" OpenJDK 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory. Waiting up to 30 to see Solr running on port 8983
I am not seeing any errors in the Solr log file, just INFO messages
the log file from the committer is this xyz crawler: 2021-05-04 19:03:40 INFO - xyz crawler: Crawler finishing: committing documents. xyz crawler: 2021-05-04 19:03:40 INFO - Committing 92 files xyz crawler: 2021-05-04 19:03:40 INFO - Sending 92 documents to Solr for update/deletion. xyz crawler: 2021-05-04 19:03:41 INFO - xyz crawler: Crawler executed in 57 seconds. xyz crawler: 2021-05-04 19:03:41 INFO - xyz crawler: Closing sitemap store... xyz crawler: 2021-05-04 19:03:41 ERROR - Execution failed for job: xyz crawler com.norconex.committer.core.CommitterException: Cannot index document batch to Solr. at com.norconex.committer.solr.SolrCommitter.commitBatch(SolrCommitter.java:400) at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179) at com.norconex.committer.core.AbstractBatchCommitter.commitComplete(AbstractBatchCommitter.java:159) at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:233) at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274) at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228) at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184) at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:354) at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:293) at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:166) at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:150) at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95) at com.norconex.collector.http.HttpCollector.main(HttpCollector.java:74) Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://wessolrtest1:8983/solr/sops: Expected mime type application/octet-stream but got text/html.
URI: | /solr/sops/update |
---|---|
STATUS: | 401 |
MESSAGE: | require authentication |
SERVLET: | default |
Can you share the key elements of your Solr security config in an attempt to reproduce the issue?
Thanks! .. I will post the security.json file along with the solr.in.cmd file and the norconex related files. Solr 8.8.2 is installed as a stand alone Solr server on a windows 2019 server let me know if you need anything else. SOLR_debugging.zip
I could reproduce. It turns out the credentials are currently not applied when commit is invoked. Until a fix is provided, you can add the following to your Solr Committer configuration block:
<solrCommitDisabled>true</solrCommitDisabled>
Solr will rely on its auto-commit configuration to commit the data.
FYI, a new snapshot release of the Solr Committer was made with a proper fix. You no longer have to apply the workaround (disabling Solr commits).
Please confirm.
I downloaded the committer 2.4.1 snapshot and ran the install script, but still getting the error and my log file still says 2.4.0 do i need to do something special to overwrite the old committer?
thanks Michael
Yes, look in your lib folder and you will likely see duplicate JARs. If you see two files starting with norconex-committer-solr-....
delete/backup the older one(s) you have.
If it still failed, you may want to reinstall the collector files and the Solr committer files to make sure you have no other duplicates.
The easiest way to install a Committer is to run the install script found once you extracted the Committer Zip file. It takes care of eliminating possible duplicates.
using Basic Authentication on a stand alone (no cloud) windows platform Solr 8.8.2 installation. the crawl is successful and the error is thrown in the committer, SSL is turned off (for the moment) the user name and password are clear text - same as the basic auth login to the Solr admin the core is called sops
my committer code is:
from the norconex log file
[non-job]: 2021-04-27 19:52:54 INFO - Starting execution. [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex HTTP Collector 2.9.0 (Norconex Inc.) [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex Collector Core 1.10.0 (Norconex Inc.) [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex Importer 2.10.0 (Norconex Inc.) [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex JEF 4.1.2 (Norconex Inc.) [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex Committer Core 2.1.3 (Norconex Inc.) [non-job]: 2021-04-27 19:52:54 INFO - Version: Norconex Committer Solr 2.4.0 (Norconex Inc.) ... xyz crawler: 2021-04-27 19:53:49 INFO - Committing 56 files xyz crawler: 2021-04-27 19:53:50 INFO - Sending 56 documents to Solr for update/deletion. xyz crawler: 2021-04-27 19:53:50 INFO - xyz crawler: Crawler executed in 56 seconds. xyz crawler: 2021-04-27 19:53:50 INFO - xyz crawler: Closing sitemap store... xyz crawler: 2021-04-27 19:53:50 ERROR - Execution failed for job: xyz crawler com.norconex.committer.core.CommitterException: Cannot index document batch to Solr. at com.norconex.committer.solr.SolrCommitter.commitBatch(SolrCommitter.java:400) at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179) at com.norconex.committer.core.AbstractBatchCommitter.commitComplete(AbstractBatchCommitter.java:159) at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:233) at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274) at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228) at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184) at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49) at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:354) at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:293) at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:166) at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:150) at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95) at com.norconex.collector.http.HttpCollector.main(HttpCollector.java:74) Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://mysolr:8983/solr/sops: Expected mime type application/octet-stream but got text/html.
HTTP ERROR 401 require authentication