elixir-cloud-aai / tesk-api

GA4GH TES API Service that translates tasks into Kubernetes Batch API calls
Apache License 2.0
7 stars 18 forks source link

500 Server Error while creating task from cwl-tes #42

Closed sgalpha01 closed 1 year ago

sgalpha01 commented 2 years ago

Python Version: 3.8.13

I was using cwl-tes to submit a task to TESK. The task which I used can be found here. The TESK api was deployed locally in minikube and was built using the current source code from the Github repository.

The error which is thrown in cwl-tes commandline is:

ERROR Workflow error:
500 Server Error:  for url: http://192.168.49.2:31567/v1/tasks

The URL refers to my local TESK deployment.

I'm sharing the exact error from the API side:

ERROR 1 --- [nio-8080-exec-1] u.a.e.t.tesk.tes.service.TesServiceImpl  : ERROR: In createTask
java.lang.IllegalArgumentException: No suffix for exponent-18

The entire logs can be found here.

lvarin commented 2 years ago

The error seems to be here:

https://github.com/elixir-cloud-aai/tesk-api/blob/master/src/main/java/uk/ac/ebi/tsc/tesk/tes/service/TesServiceImpl.java#L51

lvarin commented 2 years ago

I will try to reproduce the error, can you paste here the exact command line you used and the yaml files you are using in the call? That can save me a lot of time. :)

sgalpha01 commented 2 years ago

Yeah sure :)

cwl-tes --debug --tes http://192.168.49.2:31567  --remote-storage-url ftp://ftp/home/tesk ../cwl-example-workflows/hashsplitter-workflow.cwl ../cwl-example-workflows/hashsplitter-test.yml

Check this repo for YAML files.

lvarin commented 2 years ago

I managed to reproduce the problem. I am still looking for the cause. For the moment I just know that the problem is while calling ".gson.toJson"

lvarin commented 2 years ago

The text of the exception comes from here:

https://github.com/kubernetes-client/java/blob/master/kubernetes/src/main/java/io/kubernetes/client/custom/SuffixFormatter.java#L111

sgalpha01 commented 2 years ago

I'm not sure whatever I comment now would be of any help. I have attached a screenshot, please check that. It displays 6 commits. Except for the first one, after none of the commits I was able to package the api using mvn package. They were throwing some errors. The top one was built successfully, but after this commit only, the 500 error was introduced. So, maybe somewhere in these 6 commits, you can try to revert or change something to get it working. I know it will be a dirty fix, but this was all I can think of now.

image

lvarin commented 2 years ago

This change was done due to a very very old Kubernetes library. We were running v1, when the oldest supported version is v11. We are talking about years of differences.

So revert is not an option. I was not able to work in this since Wednesday, I hope today to be able to fix this today.

sgalpha01 commented 2 years ago

No, reverting is not an option. I just mentioned those commits so that it will be easier for you to target the source of the error. For testing the changes I made in cwl-wes, I just replaced the line

Optional.ofNullable(resources).map(TesResources::getRamGb).ifPresent(ramGb -> container.getResources().putRequestsItem(RESOURCE_MEM_KEY, new QuantityFormatter().parse(ramGb.toString() + RESOURCE_MEM_UNIT)));

which is present here: https://github.com/elixir-cloud-aai/tesk-api/blob/master/src/main/java/uk/ac/ebi/tsc/tesk/k8s/convert/TesKubernetesConverter.java#L154 with

Optional.ofNullable(resources).map(TesResources::getRamGb).ifPresent(ramGb -> container.getResources().putRequestsItem(RESOURCE_MEM_KEY, new QuantityFormatter().parse((ramGb>0.004 ? String.format("%.3f", ramGb) : "0.004") + RESOURCE_MEM_UNIT)));

which is the part of legacy code. I've made this change locally, and it bypasses the error. I do realise that this is not a solution.

lvarin commented 2 years ago

This is very useful. The problem might be that the value generated now has too many decimals?

I will check an hybrid of both codes and see

lvarin commented 2 years ago

Yes! reducing the number of decimals solved the problem. I will clean up and commit in new branch

sgalpha01 commented 1 year ago

Addressed in #43