As I user/developer I want to get a response to my query in reasonable time.
Acceptance Criteria:
[ ] temporary solution to speed up development cycle (likely) or
[x] permant solution that reduces preprocessing efforts (less likely)
Branch/Changes
Infrastructure:
Redis Modules require Docker (as the most feasable solution), thus backend and redis specifications where added as docker compose file
To sync the local development environment, a venv is advised (see Readme). As the conda version of redis-py is faulty, we unfortunately have to move back to pip / venv once more.
Backend:
Expand startup process to spin up Redis and ingest data via panda dataframe (csv to json to redis)
Add local copy of scaper csv output to reduce traffic on Github. For prod change back to the actual url and ingest data from Github, so that it is in sync with the daily scraper output.
Add default index to enable advanced features like fuzzy / stemmed search across selected fields (check schema)
Add redis specific route to display search results. Note the speed difference between redis and pandas!
Add logger to have better logging of startup process. Endpoints are logged by using the debug flag, however no printing / logging of the code. See extra ticket.
Frontend:
Changed name of endpoint(s)
Set to use redis endpoint instead, adjust data model to fit redis response
Add interfaces (TS) to match redis schemas. Note that all types are string for now, as we don`t parse numbers out of the JSON (string)
Learnings / Out of Scope:
Most of Python specific packages have been merged into the main redis package resulting into a lot of documentation being out of date
Redis documentation is lacking and requires some searching around
RedisSearch needs checking/experimentation on features like stemming - how can we enhance the search with our NLP efforts
Ticket/Issue:
User Story:
As I user/developer I want to get a response to my query in reasonable time.
Acceptance Criteria:
Branch/Changes
Infrastructure:
Backend:
Frontend:
Learnings / Out of Scope: