collective / collective.solr

Solr search engine integration for Plone
https://pypi.org/project/collective.solr/
21 stars 46 forks source link

Language specific solr configurations. #53

Open tisto opened 9 years ago

tisto commented 9 years ago

c.solr or c.r.solrinstance could be shipped with language specific Solr configurations or examples. English stemming algorithms for instance do not work well in other languages (e.g. German).

Those configurations should be tested (on travis) and included directly in the docs to make sure we have working and up-to-date configurations in our docs.

loechel commented 9 years ago

yes, this is a necessary feature, Needs to be part of the documentation and we should have a sample buildout to show that. I guess this would probably better fit into collective.recipe.solrinstance . @saily and you @tisto told me that you both have a working setup for German. So if we could combine all of those existing resources and make a writeup of those in the docs that would be perfect.

This is interconnected with #49

pilz commented 9 years ago

Proposal: We create a meaningful (to be agreed what that is) config based on the Language Analysis documentation in https://cwiki.apache.org/confluence/display/solr/Language+Analysis for a language other than english (german?) to illustrate, what the average integrator needs to think about. This config gets a test and the documentation points to above link for further tuning if desired.

At the same time we check if all that config can be done in collective.recipe.solr already, and if not, we extend accordingly.

Optional: Phonetic Matching - https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching

saily commented 9 years ago

Solr ships different stemmer and language analysis tools, i have a very deep understanding of it's configuration and we'll not be able to ship a general overall working solution for multilingual setups. This is one of the most complex and time consuming tasks when talking about information retrieval and it always depends on your usecase.

I strongly disagree to the approach creating additional redundancy by including this in our documentation. Just add a bunch of links to Solr documentation, because we'll never be able to keep in sync with their heavy development activity and short release cycles - or do all of you already know the new features of Solr 5.1?