Azure-Samples / Synapse

Samples for Azure Synapse Analytics
MIT License
375 stars 351 forks source link

Spark Job in Synapse cannot be viewed in monitoring portal - Error Message is Fetching Failed #160

Open GovindulaRakesh opened 1 year ago

GovindulaRakesh commented 1 year ago

It is rare and intermittent but there are times when the monitoring portal in Azure Synapse will misbehave and will not show me the details about a completed spark job. Instead, it displays an error message that says "Fetching Failed". Screenshot.

enter image description here

I have not yet found a pattern or explanation. I reported the problem to CSS support but they are not yet familiar with the error. I suspect it is a timeout on an internal resource, like a spark history server or something like that.

I realize that some parts of the Synapse platform are proprietary but it borrows significantly from OSS spark. Does anyone have an idea what might take so long, when retrieving the U/I for a completed livy batch? Is it Azure storage accounts that are performing badly? Or is it a "spark history server"? Is there any reason why they wouldn't wait indefinitely for a response (eg. ten mins)? Whenever this happens the U/I seems to fail after a short period of time (only ~60 seconds or so). I haven't found any other patterns. As you can see above, the error message is nothing more than a small tooltip shown in the upper right of the screen; when I shared with CSS they weren't able to provide any additional guidance or explanation. So I'm hoping there are synapse users on stack overflow who have encountered this.

Side: When things are working properly, the spark job is 
presented with the related jobs/stages/tasks/logs like so:

enter image description here