Report error (not SUCCESS) in case of runtime error

DEIB-GECO / GMQL

GMQL - GenoMetric Query Language

http://www.bioinformatics.deib.polimi.it/geco/

Apache License 2.0

18 stars 11 forks source link

Report error (not SUCCESS) in case of runtime error #12

Closed marcomass closed 6 years ago

marcomass commented 7 years ago

Fix output report in case of runtime error

If any issue occurs, do NOT report SUCCESS in the output log As examples, see job_test11_marco_20170109_162315 (on cineca)

andreagulino commented 7 years ago

@marcomass Inspecting the code and the logs produced by some executions, I cannot find evidence of this behaviour. A successful execution has : [GMQLSparkExecutor] Total Spark Job Execution Time ... as last line of the log (without explicitly reporting the final status of the application). An execution that has errors prints the stack trace associated to that error in the log.

Since the query you refer to was made in January, I think this problem disappeared as a consequence of the changes to the code made by @akaitoua since then.

andreagulino commented 6 years ago

Tested and the problem did not occur, re-open if you find a case in which it still happens

marcomass commented 6 years ago

@andreagulino I found a case where this issue occurs: see logjob_masseroli_order6_20171228_222106 in genomic

marcomass commented 6 years ago

Hi @andreagulino Do you have news about the above issue (see logjob_masseroli_order6_20171228_222106 in genomic)?

andreagulino commented 6 years ago

@marcomass I cannot replicate that error, it was either because of the old spark configuration in genomic or because spark was stuck, the log, as you told me by email, does not show the string SUCCESS. I will try to understand the problem, but I think that if everything is setup correctly should not appear.

marcomass commented 6 years ago

Yes, that is what I meant. Within the log there is the error, but the final label reported in the web interface is SUCCESS. Try to rerun the query from web interface and see it. So, probably in some cases that label is not synchronized with the log content.

andreagulino commented 6 years ago

@marcomass from what I see, the job was completed successfully (it produced a dataset at the end). Our job status is exactly the job status reported by Spark. The error that you see is something showed by Spark for "debugging" purposes that did not affect the successful completion of the job (this happens because Spark is fault tolerant towards several kinds of errors). In general, if any error happens and Spark manages to recover and complete the job, it will always report Success (and we will do the same, since a result was produced at the end).

marcomass commented 6 years ago