Closed cbleek closed 8 years ago
@fedys Can you take over?
I would like to take over this issue. Unfortunately, I have absolutely no experience with Solr for now. I will have to learn Solr basics before I can start working on it. Of course, I would study it at my own expense. I am keen on learning new things. Do you want to wait until I familiarize myself with Solr. After that we can go over the detail of the issue.
Finally, I went over the basics of Solr. I am not sure I understand the description of this issue well. I try to reproduce it myself.
There are two tasks, actually. To completely refactor fetching/hydrating of jobs and to expose Solr facets in the view layer.
I would start with the refactoring part. I went over the current implementation and found out that jobs entities are completely hydrated from a Solr query result only. Do you want me to change this to a following way? I get only entity IDs from a Solr query result and use them to fetch job entities from MongoDB. This way we could store only job fields we want to search by.
Hi Miroslav,
in schema of a collection (this is something like the structure of a database) defines, if a field is searchable or only used in the search result. Solr gives you everything you need to do a complex search and to render the result. I think, we should avoid querying mongo.
If were fetching the job entities from mongo only to render the search result we'll loose all the scalabilty features which solr offers.
Regards,
Carsten
@cbleek I am not sure you understand the optimization I wrote about. I describe current steps and optimized steps used for searching on https://yawik.org/demo/en/jobboard via Solr.
Current steps:
Optimized steps:
The current third step will become very hard to maintain. Especially when new features of job entities are implemented. Implementing Solr search for other entities (Application, ...) is also very inflexible.
The optimized third steps would completely remove duplication of entity hydration and make implementation of Solr search for other entities much simpler. There is one single drawback I know about: an additional Mongo select query with IN (id1, id2, ...) condition. I think this overhead is minimal compared to manual hydrating of job Entities.
Do we loose any features which Solr offers if we use the optimized way?
I created a gist demonstrating my PROXY entity idea
(please ignore the facets interfaces, which origins from my misunderstanding of what facets are)
We should consider splitting the entity interfaces in READ ONLY and WRITE Interfaces, which would ease the creation of READ ONLY proxy entities (lesser methods to implement) (but that's for another issue..)
@TiSiE okay, I understand you want a mixture of the both ways(solr + mongo hydration). Great idea with the demonstrating gist.
I don't understand the part with facets. Aren't facets related to a whole result? You suggest to relate facets to each returned entity. I assume a paginator should implements an extra interface (FacetsProviderInterface). This allows rendering of facets in a view (next to job result list).
@fedys Yes, your understanding of facets are correct, and your idea to use the paginator as FacetsProvider is good.
I edited my comment above, but that does not get emailed again, I guess.
@TiSiE okay, I am starting the refactoring. Please send me a list of all job fields which should be hydrated from Solr result. I will assume fields not listed by you to be hydrated from Mongo.
@TiSiE FYI, I have not received any notification about the editing of your comment. :smile:
It doesn't really matter which fields comes from where as the proxy entity will have both sources. So for the sake of flexibility and ease of use, check the solr result first, and fall back to original entity for EVERY field.
the solr result does not contain only a list of matching documents, but also nice things like factes, highlights, debug and other things.
This issue adresses the task to bring those things into the view.