cross-solution / YAWIK

YAWIK is a web application. It can be used as an ATS applicant tracking system or as a jobboard.
https://yawik.org
MIT License
125 stars 67 forks source link

display solr facets in the search result #256

Closed cbleek closed 8 years ago

cbleek commented 8 years ago

the solr result does not contain only a list of matching documents, but also nice things like factes, highlights, debug and other things.

This issue adresses the task to bring those things into the view.

Currently the ResultConverter creates new JobObjects using informations and methods from the Query-Filter. As you found out, this is a bit complicated when dealing with OrganizationNames and OrganizationLogos and will get more complicates when more informations must be processed (like facets).

The code to convert results is located in two different classes, which might become a pain in the ass to maintain.

If we had a kind of Proxy-Entity (let's call it SolrJob) which implements JobInterface and holds the original job entity as reference (for getting image urls p.e.) AND additional informations (like facets) we would have all "converting" code in one place. Informations from the original job entity could be proxied to the entity reference, and additional informations could also be exposed with an API using an dedicated interface.

Creating a factory to create such proxy entities also eliminates the need of "new", which is hard to test.

It would mean that for each result there's an additional query (fetching the original entity from the database), but the benefits (cleaner, easy testable and en-capsuled code) overweight this. Maybe lazy loading will help, if performance is an issue.

TiSiE commented 8 years ago

@fedys Can you take over?

fedys commented 8 years ago

I would like to take over this issue. Unfortunately, I have absolutely no experience with Solr for now. I will have to learn Solr basics before I can start working on it. Of course, I would study it at my own expense. I am keen on learning new things. Do you want to wait until I familiarize myself with Solr. After that we can go over the detail of the issue.

fedys commented 8 years ago

Finally, I went over the basics of Solr. I am not sure I understand the description of this issue well. I try to reproduce it myself.

There are two tasks, actually. To completely refactor fetching/hydrating of jobs and to expose Solr facets in the view layer.

I would start with the refactoring part. I went over the current implementation and found out that jobs entities are completely hydrated from a Solr query result only. Do you want me to change this to a following way? I get only entity IDs from a Solr query result and use them to fetch job entities from MongoDB. This way we could store only job fields we want to search by.

cbleek commented 8 years ago

Hi Miroslav,

in schema of a collection (this is something like the structure of a database) defines, if a field is searchable or only used in the search result. Solr gives you everything you need to do a complex search and to render the result. I think, we should avoid querying mongo.

If were fetching the job entities from mongo only to render the search result we'll loose all the scalabilty features which solr offers.

Regards,

Carsten

fedys commented 8 years ago

@cbleek I am not sure you understand the optimization I wrote about. I describe current steps and optimized steps used for searching on https://yawik.org/demo/en/jobboard via Solr.

Current steps:

  1. send query to Solr
  2. retrieve Solr result
  3. manually hydrate job Entities using Solr result (this step is a PITA, I am going to explain it later)
  4. display result in view using job Entities plus facets etc.

Optimized steps:

  1. send query to Solr
  2. retrieve Solr result
  3. return array of job doctrine proxies (via DocumentManager::getReference()) using ten IDs from Solr result or use single Mongo select query with IN (id1, id2, ... id10) condition.
  4. display result in view using job Entities plus facets etc.

The current third step will become very hard to maintain. Especially when new features of job entities are implemented. Implementing Solr search for other entities (Application, ...) is also very inflexible.

The optimized third steps would completely remove duplication of entity hydration and make implementation of Solr search for other entities much simpler. There is one single drawback I know about: an additional Mongo select query with IN (id1, id2, ...) condition. I think this overhead is minimal compared to manual hydrating of job Entities.

Do we loose any features which Solr offers if we use the optimized way?

TiSiE commented 8 years ago
  1. send query to solr
  2. retrieve solr result, which can contain optimized data such as job titles, etc.
  3. Instatiate PROXY entities which gets the solr result (for one document, not the whole) and the original entity injected.
  4. Display result in view using PROXY entities (which implement JobInterface) plus facets etc.

I created a gist demonstrating my PROXY entity idea

(please ignore the facets interfaces, which origins from my misunderstanding of what facets are)

We should consider splitting the entity interfaces in READ ONLY and WRITE Interfaces, which would ease the creation of READ ONLY proxy entities (lesser methods to implement) (but that's for another issue..)

fedys commented 8 years ago

@TiSiE okay, I understand you want a mixture of the both ways(solr + mongo hydration). Great idea with the demonstrating gist.

I don't understand the part with facets. Aren't facets related to a whole result? You suggest to relate facets to each returned entity. I assume a paginator should implements an extra interface (FacetsProviderInterface). This allows rendering of facets in a view (next to job result list).

TiSiE commented 8 years ago

@fedys Yes, your understanding of facets are correct, and your idea to use the paginator as FacetsProvider is good.

I edited my comment above, but that does not get emailed again, I guess.

fedys commented 8 years ago

@TiSiE okay, I am starting the refactoring. Please send me a list of all job fields which should be hydrated from Solr result. I will assume fields not listed by you to be hydrated from Mongo.

fedys commented 8 years ago

@TiSiE FYI, I have not received any notification about the editing of your comment. :smile:

TiSiE commented 8 years ago

It doesn't really matter which fields comes from where as the proxy entity will have both sources. So for the sake of flexibility and ease of use, check the solr result first, and fall back to original entity for EVERY field.