Closed speschl closed 5 years ago
Would you happen to have upgraded your local installation of Batch Shipyard, but the pool you're using was created with an older version?
Yes, most of the scripts were created end of 2018/beginning months of 2019. I recently had to get a new machine, so I have 3.7.0 version. However, we spin up a new pool instance each time the schedule runs.
I am unable to repro this failure.
Can you perform an upgrade to 3.8.1
? Instructions: https://github.com/Azure/batch-shipyard/blob/master/docs/01-batch-shipyard-installation.md#upgrading-to-new-releases
Please re-submit your job (sorry I missed your auto pool spec) and see if you can repro.
I updated my instance to 3.8.1 and re-submitted my job. I still get the error. Here is the shipyard-jmtask configuration.
{
"id": "shipyard-jmtask",
"jobId": "***_Manual_NewVersion:job-1",
"odata.metadata": "https://***.eastus2.batch.azure.com/$metadata#tasks/@Element",
"url": "https://***.eastus2.batch.azure.com/jobs/***_Manual_NewVersion:job-1/tasks/shipyard-jmtask",
"eTag": "0x701CE1722770000",
"creationTime": "2019-08-29T20:57:37.1582165Z",
"lastModified": "1601-01-01T00:00:00Z",
"state": "completed",
"stateTransitionTime": "2019-08-29T21:03:16.412917Z",
"previousState": "running",
"previousStateTransitionTime": "2019-08-29T21:02:31.463327Z",
"commandLine": "/opt/batch-shipyard/recurrent_job_manager.sh",
"resourceFiles": [
{
"httpUrl": "https://***.blob.core.windows.net/shipyardprp20qarf-***-qa-pool/jobschedules/oip-prp20-qa-***_Manual_NewVersion/taskmap.pickle?se=2049-08-21T20%3A57%3A36Z&sp=r&sv=2018-11-09&sr=b&sig=redacted",
"filePath": "taskmap.pickle",
"fileMode": "0640"
}
],
"containerSettings": {
"containerRunOptions": "--rm",
"imageName": "mcr.microsoft.com/azure-batch/shipyard:3.8.1-cargo"
},
"environmentSettings": [
{
"name": "version",
"value": "version 2019.08.21.12:00-2.0"
},
{
"name": "logging_level",
"value": "INFO"
},
{
"name": "jobName",
"value": "oip-prp20-qa-job-"
},
{
"name": "store_name",
"value": "redacted"
},
{
"name": "vault_uri",
"value": "redacted"
},
{
"name": "app_id",
"value": "redacted"
},
{
"name": "app_secret",
"value": "redacted"
},
{
"name": "tenant_ID",
"value": "redacted"
},
{
"name": "parquet_filename",
"value": "Release1_"
}
],
"userIdentity": {
"autoUser": {
"scope": "pool",
"elevationLevel": "admin"
}
},
"authenticationTokenSettings": {
"access": [
"job"
]
},
"constraints": {
"maxWallClockTime": "P10675199DT2H48M5.4775807S",
"retentionTime": "P7D",
"maxTaskRetryCount": 1
},
"executionInfo": {
"startTime": "2019-08-29T21:03:15.272126Z",
"endTime": "2019-08-29T21:03:16.412917Z",
"exitCode": 1,
"containerInfo": {
"containerId": "6b3113502f561f060325a7978474449547dbd424e50c1f43efa4d2487a6726f3",
"state": "created"
},
"failureInfo": {
"category": "UserError",
"code": "FailureExitCode",
"message": "The task exited with an exit code representing a failure",
"details": [
{
"name": "Message",
"value": "The task exited with an exit code representing a failure"
}
]
},
"result": "failure",
"retryCount": 1,
"lastRetryTime": "2019-08-29T21:03:15.215316Z",
"requeueCount": 0
},
"nodeInfo": {
"affinityId": "TVM:tvmps_e2b58961451c0919b4a698e1b20ddebcb27e195d7814af4b6cfe7e90a262319b_d",
"nodeUrl": "https://redacted.eastus2.batch.azure.com/pools/prp2-qa-pool_f463abc6-5cce-4182-be7c-23f73262ace4/nodes/tvmps_e2b58961451c0919b4a698e1b20ddebcb27e195d7814af4b6cfe7e90a262319b_d",
"poolId": "prp2-qa-pool_f463abc6-5cce-4182-be7c-23f73262ace4",
"nodeId": "tvmps_e2b58961451c0919b4a698e1b20ddebcb27e195d7814af4b6cfe7e90a262319b_d",
"taskRootDirectory": "workitems/***_Manual_NewVersion/job-1/shipyard-jmtask",
"taskRootDirectoryUrl": "https://***.eastus2.batch.azure.com/pools/prp2-qa-pool_f463abc6-5cce-4182-be7c-23f73262ace4/nodes/tvmps_e2b58961451c0919b4a698e1b20ddebcb27e195d7814af4b6cfe7e90a262319b_d/files/workitems/***_Manual_NewVersion/job-1/shipyard-jmtask"
}
}
@speschl As an aside, I redacted some additional sensitive information above - you may want to consider rotating your keyvault app secret.
Ok, I was able to repro this.
To mitigate this before a hotfix, please ensure all of your environment variable values are strings (in yaml). For example, the threshold
env var:
environment_variables:
input_path: wsi/Release1/Input/Release1_
output_path: wsi/Release1/Output
file_name: RiskModelResult_
threshold: '0.4' # <-- wrap in quotes to explicitly make a string
pickle_path: wsi/*
client_secret: Kw+****************************
client_ID: b65*****************
Thanks @alfpark for the heads up about the sensitive info, I changed the secret out. I did try the environment variables with the strings and that seemed to work! The shipyard-jmtask completed and the other two task have begun. Thank you so much for your help!
I'm looking to create a scheduled job and have had no problem with the configuration in the past. Recently, I've had a problem with 'shipyard-jmtask' with nearly the same script (changes in environmental variables). The error associated with the task is:
Here is my configuration scripts for the scheduled job. Actual names are changed for security reasons.
config.yaml:
credentials.yaml
job.yaml
pool.yaml
Any help with this is greatly appreciated!