collective / collective.recipe.solrinstance

Buildout recipe to configure a Solr instance
https://pypi.python.org/pypi/collective.recipe.solrinstance
5 stars 13 forks source link

building out with Solr4 #32

Closed mgrbyte closed 9 years ago

mgrbyte commented 9 years ago

We've encountered the following issue:

The first time you run buildout, parts/solr-instance and parts/solr-download are created bin/solr-instance fg will fail to start properly with an error similar to: 1937 [coreLoadExecutor-5-thread-1] ERROR org.apache.solr.core.CoreContainer – Unable to create core: collection1

The solution is to rm -rf parts/solr-instance and re-buildout.

Here's what I think is happening:

When buildout executes its parts, solr-download does not yet exist, thus the recipe is_solr_4 method returns False.

This can be verified by removing the solr-download directory, re-running buildout, causing the same error as encountered in a first time buildout of a project configured with this recipe.

Currently we work around this by running buildout twice, removing parts/solr-instance after the first buildout invokation.

We a more permanent solution be to parse the URL/buildout recipe section for the version of solr, rather than relying upon the existence of a directory (is_solr_4) ?

davidjb commented 9 years ago

Could you provide your buildout.cfg or a cut-down version? It might be that your solr-download is being included or happening after the solr-instance part.

In any case, I agree having a more robust method of checking the version is wise; if the example/solr/collection1 directory (what is_solr_4() solely relies upon) happens to be removed/renamed/etc from a later Solr version for any reason, then the recipe is going to break. Likewise, if changes happen for Solr 5 and future versions, then we need a better way of tracking versions.

URL parsing isn't an option since downloading Solr isn't part of this recipe, so the best/easiest option adding an option for version = x.x and parsing that (eg pkg_resources.parse_version) to compare versions. Since this options doesn't yet exist, backwards compatibility is an issue here -- so the existing directory check can be used as a fallback if version hasn't (yet) been specified.

mgrbyte commented 9 years ago

virtualenv: buildout.bootstrap (1.4.6) zc.buildout (2.2.1)

eggs generated by buildout: collective.recipe.solrinstance-5.3.2-py2.7.egg Genshi-0.7-py2.7-linux-x86_64.egg hexagonit.recipe.download-1.7-py2.7.egg

The following is enough to demonstrate, this is taken from the examples on https://pypi.python.org/pypi/collective.recipe.solrinstance/5.3.2, with addition of default-search-field to make it build.

Inspecting the template generated in parts/solr/solr/collection1/conf/schema.xml, the version in the outer config is 1.4, where as it should be 1.5, had schema.xml.tmpl from templates4 in the recipe been used during the build.

As stated above, removing parts/solr, then re-running buildout works around this. I also put a pdb in is_solr_4 which helps to see what's going on.

Will attempt to find time to write a patch later. Thanks!

Example config follows:

[buildout] parts = solr-download solr

[solr-download] recipe = hexagonit.recipe.download strip-top-level-dir = true url = http://mirror.ox.ac.uk/sites/rsync.apache.org/lucene/solr/4.10.1/solr-4.10.1.tgz

[solr] recipe = collective.recipe.solrinstance solr-location = ${solr-download:location} host = 127.0.0.1 port = 1234 max-num-results = 500 section-name = SOLR unique-key = uniqueID default-search-field = uniqueID index = name:uniqueID type:string indexed:true stored:true required:true name:Foo type:text copyfield:Baz name:Bar type:date indexed:false stored:false required:true multivalued:tru\ e omitnorms:true copyfield:Baz name:Foo bar type:text name:Baz type:text name:Everything type:text filter = text solr.LowerCaseFilterFactory char-filter-index = text solr.HTMLStripCharFilterFactory tokenizer-query = text solr.WhitespaceTokenizerFactory additional-schema-config =

mgrbyte commented 9 years ago

Whilst I think being backwards compatible is good, I'm not sure that preserving the current behaviour of checking for the existence of the sol4 example directory is a good idea. Wouldn't (for example): self.solr_version = parse_version(self.instanceopts.get(version, '1.3.6)) be preferable?

The current code always breaks on first invocation, because it selects the wrong template due to the non-existence of the solr4 directory at the time the recipe's install() method executes.

davidjb commented 9 years ago

I agree having a version number is the best solution. The code you've suggested above appears to default to version 1.3.6; which would cause existing Solr 4 installs without a version specified to start using the earlier Solr 3 templates. Hence my suggestion of keeping the current behaviour as a fallback since current installs could be either Solr < 3 or Solr 4+.

All that said, maybe it's worth thinking about dropping support for earlier versions outright, since Solr 4 has been out since October 2012. Anyone wanting to still use this recipe with earlier versions can pin accordingly. How many people are actively using Solr <= 3.x with this recipe these days?

mgrbyte commented 9 years ago

@davidjb Good question re Solr <= 3.x I think a new version line of this recipe that supports 4.x only is a good approach. It will simplify the recipe code too no doubt, not having to deal with multiple versions of Solr templates.

tisto commented 9 years ago

+1 for a Solr 4.x only version. I plan to do the same for collective.solr at some point.

lukasgraf commented 9 years ago

+1 for dropping Solr 3 support from me as well. Just make sure to include a major version bump (6.0.0) in order not to catch people off guard.

saily commented 9 years ago

See: #36 and #39.

The update method does not remove the solr-instance directory. I see no disadvantage to always remove and rebuild when running buildout, since the index is stored elsewhere. Do you agree?

mgrbyte commented 9 years ago

FWIW, I agree

saily commented 9 years ago

Ah, you're talking about the is_solr_4 method which just tried to detect the collection1 folder to decide wether to use Solr 4 or Solr 3 template folders. This has been refactored in #40 so this is obsolete.

saily commented 9 years ago

Should have been fixed during refactoring in #40.