Closed alastair closed 1 year ago
I've been checking how feasible it would be to use pysolr. I managed to reproduce search queries quite easily. The only thing that I did not try, and don't really understand for now is the highlighthing.
Hi, highlighting is used only in forum search, not in sounds search. It is used to highlight words that match the query terms, eg: https://freesound.org/forum/forums-search/?forum=&advanced_search=0&q=test&dt_from=&dt_to=-
I think the steps to follow to adopt pysolr (and later upgrade solr) should be the following:
SearchEngine
. This SearchEngine
should be capable of using different "backends" and will call methods on these. A "backend" could be a subclass of a SearchEngineBackend
class which implements some abstract methods from the parent. These methods should mainly be "search", "add_to_index", "remove_from_index" (but there are other things that also use solr like the tagcloud I think so we should think about this well).SearchEngine
which uses our current custom solr library (Solr45CustomBackend
?). At this point, we can test that results obtained using the new wrapper are exactly the same as before making any code changes.SearchEngine
which uses pysolr, but still with solr 4.5 (Solr45PysolrBackend
). Test that results with Solr45CustomBackend
and Solr45PysolrBackend
are exactly the same.Solr5Backend
). Test that results are exactly the same.Solr8Backend
).These steps will take time, but that is fine. I'd suggest do 1-3 and then merge and deploy. Then 4. should be also relatively easy and I guess 5 will be more complex because we'll need to decide how to address incompatible changes if any.
I wonder if before starting any of this we should try to merge some of the changes in the clustering branch because I think there are significant changes in the search code...
Another radical idea I just had.... re-implement forum search using Postgres instead of Solr so in solr we only have the sound index and don't need to worry about other stuff. I think it should be doable because Postgres supports text search but I have no idea how complicated it should be. If we did that we could get rid of some code we have to synchronise solr forum index with actual forum contents.
Thanks for detailing the steps. It makes sense!
For replacing the solr with postgres for the forum, I don't know right now, since I have not looked much into the forum functionalities. Also, if the highlighting is something that is used for the forum, we will have to re-implement somehow I guess.
Also, if the highlighting is something that is used for the forum, we will have to re-implement somehow I guess.
Yes, I'm pretty sure it is doable and easy with postgres. However if highlighting is a problem we could consider not doing it anymore.
There are things in the search views that are related to how we process the query that is related to Solr, and it is currently defined in different classes implemented in the utils/search/solr.py
file. I am not sure we can really abstract that part, and we will have to keep some things in the view file related to the query object manipulation.
Shall I in any case have these methods in the wrapper instead of the solr.py file? Since things related to the query will be used for both our first "backends" (without and with pysolr). Actually, it is hard to think it as different backend, cause it is actually the same thing, pysolr just gives us some "easier" ways to send requests to Solr.
Maybe it is really a better idea to merge things from the clustering branch before doing these changes, as you already suggested...
We can maybe continue this dicussion somewhere else.
Sometimes the level of abstraction will be difficult. I guess things could be discussed on a case-by-case basis so you might want to mention an example here. But in general I'd try to think of it as that solr-related things are only in the backends. No problem if two solr backends are very similar because these are there only to test. We'll eventually have one single backend in use. If in the future for example we consider moving away from solr, then we'd need to implement a new backend, and that would certianly be very different than solr's one.
About merging clustering code yes I think it should be done. I think the steps would be:
Alright. It just feels that I will be doing a wrapper inside a wrapper for the search engine and backends. Anyway, I will do something basic and then we can discuss on it.
For the clustering, I will work on it now, and will repport to you at the Uni tomorrow what is the state.
I'm closing this issue as it was addressed with recent changes on search engine code and solr verisons.
We currently have a custom solr library for search. In order for future maintainability and support in newer versions of solr we should replace this with pysolr: https://github.com/django-haystack/pysolr
We need to check if there are different ways of doing the queries that we have - e.g. dismax, facets, highlighting, pagination. Perhaps we can have a light wrapper around pysolr to make our common functionality easier.