FHNW-IVGI / Geoharvester

NDGI Project Geoharvester
10 stars 1 forks source link

integration of NLTK / TFIDF #11

Closed fionatiefenbacher closed 1 year ago

fionatiefenbacher commented 1 year ago

In order to rank the search output, natural language processing is necessary. Various Python libraries such as NLTK and algorithms like TFIDF can for example create a relevance matrix from a text.

FStriewski commented 1 year ago

For the redis evaluation (https://github.com/FHNW-IVGI/Geoharvester/issues/9) I`ll follow along this guide: https://redis.com/blog/redismart-real-time-json-product-catalog-service/

Note how they build the fuzzy search / suggestion feature around predefined categories. While we probably also want to index on the tfidf score (among other obvious fields like e.g. Kantons) we might want to think along this route as well:

eliaferrari commented 1 year ago

@FStriewski I would suggest to merge this branch to the main for the preprocessing part. All the implemented functions are contained in a separate folder under utils.py and should not be in conflict with the main branch.

eliaferrari commented 1 year ago

Branch merged to the main in order to exploit its functions.