Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

Can't access task logs with getJobFile() #332

Closed angusrtaylor closed 5 years ago

angusrtaylor commented 5 years ago

Before submitting a bug please check the following:

Description

When a job fails, log files are generated under //logs in blob storage. I cannot find a way to access these files from the R session using getJobFile(). It seems to be because the naming convention for these files differs from that of stdout, stderr and results files. E.g. stdout files are output as

//stdout/1-stdout.txt //stdout/2-stdout.txt //stdout/3-stdout.txt ... which can be retrieved with getJobFile("", "", "stdout.txt"). However, log files are output as: //logs/1.txt //logs/2.txt //logs/3.txt ...

There doesn't seem to be a way of extracting these files. E.g. the following do not work: getJobFile("", "", "logs") getJobFile("", "", ".txt") getJobFile("", "", "logs.txt")

brnleehng commented 5 years ago

I'll take a look at this and will get back to you.

Thanks! Brian

brnleehng commented 5 years ago

Are you using SetAutoDeleteJob flag? The issue could be that SetAutoDeleteJob is set to TRUE which means the files under the job have already been deleted.

You can use the doAzureParallel's Storage APIs to get the task logs from storage (https://github.com/Azure/doAzureParallel/blob/6d14d4522b1ff4218f19549b89dea6419a230a53/docs/73-managing-storage.md)

files <- listStorageFiles("job20181129064540", prefix = "logs")
View(files)

cat(getStorageFile("job20181129064540", "logs/3.txt"))

The getJobFile function seems to not be extracting the Batch response correctly. I'll get a fix in for this.

angusrtaylor commented 5 years ago

The log files are generating correctly and I can view them in Storage Explorer/Portal. The problem is that it seems impossible to retrieve them using getJobFile. Thanks for the listStorageFiles workaround.

brnleehng commented 5 years ago

Hi @angusrtaylor

Can you verify that the following function?

getJobFile("job20181205211122", "1", "wd/1.txt")
getJobFile("job20181205211122", "1", "stderr.txt")
getJobFile("job20181205211122", "1", "stdout.txt")

There's a couple things we need for fixing.

I'll add a node list files directory API for doAzureParallel so users know how Batch structures its files. Update the docs for differences on getJobFile and getStorageFile. The directory mappings are different which makes it pretty confusing.

Thanks for the feedback, Brian

angusrtaylor commented 5 years ago

Thanks, yes all those functions work for me.