Closed SoerenHenning closed 1 year ago
Looking at this, it seems like this error would occur if the database is signaling ready, but then the recommender does not receive any data during the training phase.
I've added a retry mechanism here and added the NullpointerException to the exception block (it seems to think that there will never be a ClientException in that block). Could you checkout the version from development and see if this fixed the issue?
Thanks! Is there a Docker image to test your fix?
It seems the github actions for the commit are stuck in queueing, I've retriggered and seems like it is working again so the new images should be available via the :development tag.
Let me know if this already resolves your issue. If it's about temporary connection issues this should resolve it. If the recommends can permanently not reach the persistence despite it being labeled as online, then we will need to take an in depth look at what is going on.
After running a bunch of experiments, I did not see that issue again so I think it's solved.
great to hear, feel free to reopen the issue in case it does pop back up :)
Occasionally, I'm getting the following error in the Recommender service:
I cannot really say, in which scenarios the exception is thrown, but if the problem occurs, the
/train/isready
endpoint always return false. It can then usually be solved by 1-2 restarts. Maybe it is sufficient to add theClientException
to the catch block:https://github.com/DescartesResearch/TeaStore/blob/c1554c7fbfa9eb55c2bfc952d839f1cabe6e3c66/services/tools.descartes.teastore.recommender/src/main/java/tools/descartes/teastore/recommender/servlet/TrainingSynchronizer.java#L183-L192
My setup is basically the same as in #230 and my original problem, described there, might have also been partially related to this issue.