geocollections / geokogud

Geocollections of Estonia
http://geocollections.arendus.geokogud.info/
GNU Affero General Public License v3.0
4 stars 2 forks source link

Global search - server side #104

Closed Kotodevochka closed 7 years ago

Kotodevochka commented 7 years ago

As a user, I want to have a possibility to search globally (in all tables) for some information.

AC:

  1. Global search performs search in all essential tables.
  2. Enteries are searched by artificial ID, number fields, names and keywords.
Kotodevochka commented 7 years ago

Okay, so I decided to go with Apache Lucene, not just with ordinary API requests to search across multiple endpoints for some value. Apache Lucene is much better performance-wise and at the same search results will be much more precise than those retrieved using API requests. Index will be built/updated from database entries using asynchronous tasks which will be executed every day (?) for following tables:

  1. specimen
  2. specimen_image
  3. sample
  4. locality
  5. image
  6. taxon
  7. reference
  8. stratigraphy

Index update is performed only for those entries, which have modified date_changed since last update (or for new entries). Mosty probably we will not keep index in RAM, but on the disk (on development environment we have only 800 mb free RAM).

Kotodevochka commented 7 years ago

https://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Kotodevochka commented 7 years ago

<3 this task

Kotodevochka commented 7 years ago

Now implementing index updating process. Updating logic:

  1. Get entries from API sorted by date_changed in descending order, update only those entries which do not exist in the index or entries which have different date_changed value
  2. Get entries from API sorted by id in descending order, add entries which do not exist in the index.
Kotodevochka commented 7 years ago

https://github.com/geocollections/geokogud/wiki/Indexing-for-global-search Documented global search.