Open ablack3 opened 5 months ago
3 cases here:
DataNode was not able to reach Execution Engine and submit analysis. In this case the status of analysis should be marked as "Failed".
There is no response from Execution Engine about the status of analysis for a long time. In case callback configuration was not set correctly and Execution Engine was not able to sent callback. In this case we need to invalidate the job after some time. E.g, if there is no response from Execution Engine during a 1 hour, it means the fob was failed. 1 hour should be parametrised.
We need to invalidate and marked as "FAILED" all jobs that are in Executing state during the Data Node restart. Executing/Aborting => Failed
As soon as it is clear that the R code has failed to run the datanode UI should update to reflect this (i.e. execution failed). Instead what I currently see is that analyses will fail which is clear from the docker logs of the datanode container but the UI still says "executing" for quite a long time.