Closed sgalpha01 closed 2 years ago
The changes in that PR are to upgrade the library version from the ancient v1.0 to v11.0. It was motivated by a security threat. Reverting the PR is not an option, due to the security vulnerability.
Can you share the details on how you got to the conclusion that the fault was that PR?
I am not doubting that the PR is the cause, but I would appreciate some details.
Sorry for not providing details earlier. I am using a local TESK setup using Minikube. The deployment was done using the helm chart as provided in the TESK repo. It is using eu.gcr.io/tes-wes/tesk-api:v1.1 image, which I believe is based on this commit: 27a1323. The cancel operation was working fine with the default image. But when I pulled the latest code and build an image using that, the cancel operation threw the error which I mentioned in this issue. To pinpoint which commit caused the issue, I tried to switch between the commits (I used Binary Search approach) . Finally after experimenting, I found that this commit: 74a5bc0 caused the issue. I've not checked the exact issue, but I believe it can be done by looking at the changes made in that commit.
I run a test in Rahti (https://csc-tesk-1.rahtiapp.fi) with a compiled version from the master branch, and indeed there is a 500 error:
uk.ac.ebi.tsc.tesk.k8s.exception.KubernetesException: Internal Server Error
--
2022-08-16
09:03:03.101 TRACE 1 --- [nio-8080-exec-8]
u.a.e.t.t.k.s.KubernetesClientWrapper : ApiException ResponseBody:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Invalid
JSON Patch","code":500}
and it suggest the error comes from the commit mentioned above. In file src/main/java/uk/ac/ebi/tsc/tesk/k8s/constant/Constants.java
, arround line 145.
Yes, you are right, #41 resolves the issue. Thank you.
For testing purposes, I was running this task: https://github.com/elixir-cloud-aai/TESK/blob/master/examples/resources/cpu.json
The task is running fine, but the issue comes while cancelling the task. On cancelling, the API is returning status code 500. It will be more clear by looking at the response:
On inspecting, I found that #27 is responsible for the breaking change.