elixir-cloud-aai / tesk-api

GA4GH TES API Service that translates tasks into Kubernetes Batch API calls
Apache License 2.0
7 stars 18 forks source link

Cancelling task returns status code 500 #40

Closed sgalpha01 closed 2 years ago

sgalpha01 commented 2 years ago

For testing purposes, I was running this task: https://github.com/elixir-cloud-aai/TESK/blob/master/examples/resources/cpu.json

The task is running fine, but the issue comes while cancelling the task. On cancelling, the API is returning status code 500. It will be more clear by looking at the response:

❯ curl -X POST "http://192.168.49.2:31567/v1/tasks/task-265be324:cancel" -H "accept: application/json"
{
  "timestamp" : "2022-08-12T10:33:13.987+0000",
  "status" : 500,
  "error" : "Internal Server Error",
  "message" : "Bad Request",
  "path" : "/v1/tasks/task-265be324:cancel"
}

On inspecting, I found that #27 is responsible for the breaking change.

lvarin commented 2 years ago

The changes in that PR are to upgrade the library version from the ancient v1.0 to v11.0. It was motivated by a security threat. Reverting the PR is not an option, due to the security vulnerability.

Can you share the details on how you got to the conclusion that the fault was that PR?

lvarin commented 2 years ago

I am not doubting that the PR is the cause, but I would appreciate some details.

sgalpha01 commented 2 years ago

Sorry for not providing details earlier. I am using a local TESK setup using Minikube. The deployment was done using the helm chart as provided in the TESK repo. It is using eu.gcr.io/tes-wes/tesk-api:v1.1 image, which I believe is based on this commit: 27a1323. The cancel operation was working fine with the default image. But when I pulled the latest code and build an image using that, the cancel operation threw the error which I mentioned in this issue. To pinpoint which commit caused the issue, I tried to switch between the commits (I used Binary Search approach) . Finally after experimenting, I found that this commit: 74a5bc0 caused the issue. I've not checked the exact issue, but I believe it can be done by looking at the changes made in that commit.

lvarin commented 2 years ago

I run a test in Rahti (https://csc-tesk-1.rahtiapp.fi) with a compiled version from the master branch, and indeed there is a 500 error:

uk.ac.ebi.tsc.tesk.k8s.exception.KubernetesException: Internal Server Error
--
2022-08-16 
09:03:03.101 TRACE 1 --- [nio-8080-exec-8] 
u.a.e.t.t.k.s.KubernetesClientWrapper    : ApiException ResponseBody: 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Invalid
 JSON Patch","code":500}

and it suggest the error comes from the commit mentioned above. In file src/main/java/uk/ac/ebi/tsc/tesk/k8s/constant/Constants.java, arround line 145.

sgalpha01 commented 2 years ago

Yes, you are right, #41 resolves the issue. Thank you.