Use cache to skip generating description and audio

To speed up the answer in the telegram bot, we could be generating descriptions and audios in prior (in some job in CI/CD or Airflow, as an option). But this has two drawbacks: (1) you need to implement this, which take time, and (2) once you want to-regenerate all the descriptions, it will take a while (days on our server), which makes feedback loop too long - if you want to see how new approach to the generation works, you'll need to wait or implement some calculation for part of the data, which again takes additional effort.

The next option to speed up answers is to use cache. We have cache/ dir where generated audios are put, so now we need to make use of that. One of the options is to split the request to the REST API server with ML model in https://github.com/aguschin/art-guide/blob/main/bot.py: search for the result, check if the audio exists in the cache, generate if not. This will give us higher flexibility in updating the method of generation and getting feedback faster.

One subproblem here: we need to version generated audios, so we know that after updating the generation mechanism we re-generate audio. For this, we can add some VERSION variable to description generation part, and save audio in cache/$VERSION/ folder, e.g. cache/1/Da_Vinci__Mona_Lisa.wav

aguschin / art-guide

Use cache to skip generating description and audio #90