FAForever / faf-java-api

The FAForever REST api
https://api.faforever.com
MIT License
30 stars 29 forks source link

Add fulltext search support #394

Open Brutus5000 opened 4 years ago

Brutus5000 commented 4 years ago

Originates from https://github.com/FAForever/downlords-faf-client/issues/1681

In general Elide supports fulltext search utilising Hibernate Search with a SearchStore. This would require to setup either Solr or Elastic and annotate the required entities.

l3002 commented 7 months ago

Hi @Brutus5000, I would like to work on this issue if this isn't assigned to anyone else. Kindly assign this to me.

l3002 commented 6 months ago

Hey @Brutus5000, I just wanted to let you know that as this is a huge change, this issue might take 2-3 months depending upon the configuration, development and testing time I have estimated over the past one week. Just wanted to run this by you and I hope It won't be a problem if it takes this long to implement it.

bukajsytlos commented 6 months ago

it waited for 4 years 😅. don't worry

Brutus5000 commented 6 months ago

Makes me question if it is worth it though. Can you outline the changes you think need to be done?

l3002 commented 6 months ago

Well first we'll have to configure solr server wherever we'll deploy it (I suppose we would be deploying this on docker, right?) Then, we'll have to create collections for separate Entities (I'm sure their would be an api to configure this internally too, I still have to look into that). Then we'll have to populate the collections by indexing documents in it. Field Analysis would be the hardest part as we'll have to define what is supposed to be done for the fields indexed in the documents and how should they be processed. I'm still unsure of how will we query data through this api, because we won't be directly picking up data from the db but we'll have to query it from our Solr Server where the data would be indexed.

Also, We'll have to configure this api in such a way that we every time a new entry is added to the db it gets indexed in the Solr server too (again I think there's an api which we can use to do so)

Now, This is a very high level analysis. I still have a lot to understand on how we can use Solr effectively here.

l3002 commented 6 months ago

@Brutus5000, As I said a lot of work needs to be done. 😅

Brutus5000 commented 6 months ago

Did you look at the linked documents? There is some sort of support in our main library Elide (see Elide Datastore Searh) and the underlying Hibernate Search implementation.

So we shouldn't require a low level implementation of all the features. Just figure out how to configure it with an external Solr instance (yes, it can run in Docker).

l3002 commented 6 months ago

So, I went through the linked documents for the Elide library and it's implementation for SearchStore and using it with Hibernate Search. Though I think this elide would work with Hibernate Search alone but I don't think we can integrate solr with elide's SearchStore, maybe we can directly use Hibernate Search in integration with Solr but I wasn't able to find anything on that over the internet either. Though, I found this example for integration between HS & Solr : https://github.com/avner-levy/hibernate_search_solr_integration but I'm not sure how effective would it be. There isn't a official documentation on integration between Solr and HS.

Also, Is there a specific reason, why we are looking for an integration between Solr and HS or ElasticSearch and HS, both Solr and HS use Lucene internally and as per the information on the internet and documentation both would have almost equal retrieval time on a single node setup.

Also, I found the PR raised for this issue by @ahsanbagwan and I believe there was some issue with indexing and full text search not working properly for wildcard search. For indexing as mentioned by @bukajsytlos that lucene was re-indexing the whole database after the container restart and it was taking 1 minute to index data onto lucene. Wouldn't it still be a problem if tried to implement it and it didn't work?

Maybe we should only use Hibernate Search as Hibernate Search will actually take care of the synchronization on it's own anyway and eventhough technically we can integrate solr with hibernate search but still that would require configuring solr to work with the local lucene used by hibernate search. With HS we would just have to configure the analyzer to our requirements for the desired results.

Brutus5000 commented 6 months ago

Oh I thought Solr is just Apache Lucene as standalone app and is therefore supported. It doesn't have to be Solr, but it shouldn't be entirely in-mempry either.

l3002 commented 6 months ago

Yeah, no issues, I'll analyze the architectural requirements before implementing anything. I would just need sometime and sorry for the delay in response. I was a bit busy for a past few days.