collective / collective.solr

Solr search engine integration for Plone
https://pypi.org/project/collective.solr/
22 stars 46 forks source link

Allow multiple collections #376

Open gforcada opened 5 months ago

gforcada commented 5 months ago

Solr is great, but it has a few downsides:

... and that's specially hurting if reindexing the complete website takes a sizeable amount of time (for us around 24h hours).

💡 One mitigation strategy we have been using is to make the changes on non-production environment, and as soon as the critical amount of content has been reindexed, move Solr data from non-production to production and finish the reindexing there.

Another strategy that I read somewhere (probably on the solr docs) is to configure a second parallel collection, do the full reindex there (while the existing collection is still being used), and whenever reindexing has catch-up, switch them over ✨

Would that be something that could be done within collective.solr ? 🤔

davisagli commented 5 months ago

@gforcada I was thinking about the same thing, but haven't had a chance to work on it. I think a key thing to solve is making sure that the indexing of the new collection has a way to catch up with changes to any documents that are modified during the reindex process.

gforcada commented 1 month ago

😖 sorry, way too many things on my plate as of late 🙃

Thinking it twice, the two collections solution does not fit to fix the first problem: upgrading to a new version, as you can not have two different solr versions on the same server...

So, allowing to configure multiple Solr instances would be the solution here? 🤔

Probably we are approaching it the wrong way, Solr itself has to have some tooling around that...