MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.2k stars 21.35k forks source link

Cannot debug locally - GPU #66764

Closed michalmar closed 3 years ago

michalmar commented 3 years ago

I followed the step by step instruction for debugging AML Run locally. I am not able to start the docker - getting error:

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

I am not trying to run with GPU, I am OK with CPU.

My env: Windows 10 VS Code Docker using Linux Images in WSL2

Error: info: Successfully created run configuration local-rc3 error: Error while running local experiment Run: 4. SystemExit: Failed to launch docker container.

the detail from run: [2020-11-26T14:17:29.342997] Logging experiment running status in history service. Running: ['docker', 'run', '--name', 'local-debug_1606399855892', '--rm', '-v', 'C:\Users\mimarusa\AppData\Local\Temp\aitools_WFZ0R3\temp_aml_local_run_project:/azureml-run', '--shm-size', '2g', '--gpus', 'all', '-e', 'EXAMPLE_ENV_VAR=EXAMPLE_VALUE', '-e', 'AZUREML_CONTEXT_MANAGER_TRACKUSERERROR=eyJTa2lwSGlzdG9yeUltcG9ydENoZWNrIjoiRmFsc2UifQ==', '-e', 'AZUREML_CONTEXT_MANAGER_RUNHISTORY=eyJPdXRwdXRDb2xsZWN0aW9uIjp0cnVlLCJEaXJlY3Rvcmllc1RvV2F0Y2giOlsibG9ncyJdLCJFbmFibGVNTGZsb3dUcmFja2luZyI6dHJ1ZSwic25hcHNob3RQcm9qZWN0IjpmYWxzZX0=', '-e', 'AZUREML_CONTEXT_MANAGER_PROJECTPYTHONPATH=bnVsbA==', '-e', 'AZUREML_RUN_TOKEN_EXPIRY=1608214256', '-e', 'AZUREML_RUN_TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6IkU3Mzg0MTE5QjM3RUMzNTc1MjQzM0UyQ0RGMjI0RUJGODIzOEQ5RTgiLCJ0eXAiOiJKV1QifQ.eyJyb2xlIjoiQ29udHJpYnV0b3IiLCJzY29wZSI6Ii9zdWJzY3JpcHRpb25zLzZlZTk0N2ZhLTBkNzctNDkxNS1iZjY4LTRhODNhOGJlYzJhNC9yZXNvdXJjZUdyb3Vwcy9tbG9wcy1yZy9wcm92aWRlcnMvTWljcm9zb2Z0Lk1hY2hpbmVMZWFybmluZ1NlcnZpY2VzL3dvcmtzcGFjZXMvbWxvcHMtZGVtbyIsImFjY291bnRpZCI6IjAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMCIsIndvcmtzcGFjZUlkIjoiZjZlODdhODYtYzc2OS00ZGUxLWI2NjEtOTY4YjJjMzMxMjNjIiwicHJvamVjdGlkIjoiMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwIiwiZGlzY292ZXJ5IjoidXJpOi8vZGlzY292ZXJ5dXJpLyIsInRpZCI6IjcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0NyIsIm9pZCI6IjExNzgyOWViLWVlNjQtNDUxOC1iYzA2LTY5NWI4YzhjZjg5NiIsInB1aWQiOiIxMDAzM0ZGRjk2REU3Q0VEIiwiaXNzIjoiYXp1cmVtbCIsImFwcGlkIjoiTWljaGFsIE1hcnVzYW4iLCJleHAiOjE2MDgyMTQyNTYsImF1ZCI6ImF6dXJlbWwifQ.Km4dC5JPZbTZE4-Bf_1K52oRairddiYh_sxix-z-NsbO0ARo6iIKW2cAnJvFVwETm1a1893pFHxc9eVeX49CZZ5YRqX6fpfEAhQWMbpOsjrhLqPUtAE-gxxuNvyWm-_qHf7obCLGRoAOx_m2It5Uq7Nyu9n39ce65rB8bsUGS79Zyhdcd4n95bvF3DpdXjCEuA5N63GqjMVA4Bz0Ssf6as_8wIhXbQ6SQzn2d2rgxlELux8ymOS3nLcge7cEtXwtX7anfM07X5BMLRxFwVe2BCxUWylaT1Tq3PtjPgS2RAfePORDDGg2I0W6fOZN10OhIibs-kGhNIenKm6jpWsKRQ', '-e', 'MLFLOW_TRACKING_TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6IkU3Mzg0MTE5QjM3RUMzNTc1MjQzM0UyQ0RGMjI0RUJGODIzOEQ5RTgiLCJ0eXAiOiJKV1QifQ.eyJyb2xlIjoiQ29udHJpYnV0b3IiLCJzY29wZSI6Ii9zdWJzY3JpcHRpb25zLzZlZTk0N2ZhLTBkNzctNDkxNS1iZjY4LTRhODNhOGJlYzJhNC9yZXNvdXJjZUdyb3Vwcy9tbG9wcy1yZy9wcm92aWRlcnMvTWljcm9zb2Z0Lk1hY2hpbmVMZWFybmluZ1NlcnZpY2VzL3dvcmtzcGFjZXMvbWxvcHMtZGVtbyIsImFjY291bnRpZCI6IjAwMDAwMDAwLTAwMDAtMDAwMC0wMDAwLTAwMDAwMDAwMDAwMCIsIndvcmtzcGFjZUlkIjoiZjZlODdhODYtYzc2OS00ZGUxLWI2NjEtOTY4YjJjMzMxMjNjIiwicHJvamVjdGlkIjoiMDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAwIiwiZGlzY292ZXJ5IjoidXJpOi8vZGlzY292ZXJ5dXJpLyIsInRpZCI6IjcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0NyIsIm9pZCI6IjExNzgyOWViLWVlNjQtNDUxOC1iYzA2LTY5NWI4YzhjZjg5NiIsInB1aWQiOiIxMDAzM0ZGRjk2REU3Q0VEIiwiaXNzIjoiYXp1cmVtbCIsImFwcGlkIjoiTWljaGFsIE1hcnVzYW4iLCJleHAiOjE2MDgyMTQyNTYsImF1ZCI6ImF6dXJlbWwifQ.Km4dC5JPZbTZE4-Bf_1K52oRairddiYh_sxix-z-NsbO0ARo6iIKW2cAnJvFVwETm1a1893pFHxc9eVeX49CZZ5YRqX6fpfEAhQWMbpOsjrhLqPUtAE-gxxuNvyWm-_qHf7obCLGRoAOx_m2It5Uq7Nyu9n39ce65rB8bsUGS79Zyhdcd4n95bvF3DpdXjCEuA5N63GqjMVA4Bz0Ssf6as_8wIhXbQ6SQzn2d2rgxlELux8ymOS3nLcge7cEtXwtX7anfM07X5BMLRxFwVe2BCxUWylaT1Tq3PtjPgS2RAfePORDDGg2I0W6fOZN10OhIibs-kGhNIenKm6jpWsKRQ', '-e', 'MLFLOW_TRACKING_URI=azureml://westeurope.experiments.azureml.net/mlflow/v1.0/subscriptions/6ee947fa-0d77-4915-bf68-4a83a8bec2a4/resourceGroups/mlops-rg/providers/Microsoft.MachineLearningServices/workspaces/mlops-demo', '-e', 'MLFLOW_RUN_ID=local-debug_1606399855892', '-e', 'MLFLOW_EXPERIMENT_ID=0708265e-6496-44c0-8661-dd8228ee6ae0', '-e', 'HBI_WORKSPACE_JOB=false', '-e', 'AZUREML_RUN_TOKEN_RAND=9bd05484-71ff-4d5e-a17b-2c04904d6316', '-e', 'AZUREML_RUN_TOKEN_PASS=26503101-c78a-4f9d-a76d-975370c05a9b', '-e', 'PYTHONUNBUFFERED=True', '-e', 'AZUREML_COMMUNICATOR=None', '-e', 'AZUREML_FRAMEWORK=Python', '-e', 'AZUREML_EXPERIMENT_ID=0708265e-6496-44c0-8661-dd8228ee6ae0', '-e', 'AZUREML_ARM_PROJECT_NAME=local-debug', '-e', 'AZUREML_ARM_WORKSPACE_NAME=mlops-demo', '-e', 'AZUREML_ARM_SUBSCRIPTION=6ee947fa-0d77-4915-bf68-4a83a8bec2a4', '-e', 'AZUREML_ARM_RESOURCEGROUP=mlops-rg', '-e', 'AZUREML_EXPERIMENT_SCOPE=/subscriptions/6ee947fa-0d77-4915-bf68-4a83a8bec2a4/resourceGroups/mlops-rg/providers/Microsoft.MachineLearningServices/workspaces/mlops-demo/experiments/local-debug', '-e', 'AZUREML_WORKSPACE_ID=f6e87a86-c769-4de1-b661-968b2c33123c', '-e', 'AZUREML_WORKSPACE_SCOPE=/subscriptions/6ee947fa-0d77-4915-bf68-4a83a8bec2a4/resourceGroups/mlops-rg/providers/Microsoft.MachineLearningServices/workspaces/mlops-demo', '-e', 'AZUREML_DATA_CONTAINER_ID=dcid.local-debug_1606399855892', '-e', 'AZUREML_DISCOVERY_SERVICE_ENDPOINT=https://westeurope.experiments.azureml.net/discovery', '-e', 'AZUREML_RUN_HISTORY_SERVICE_ENDPOINT=https://westeurope.experiments.azureml.net', '-e', 'AZUREML_SERVICE_ENDPOINT=https://westeurope.experiments.azureml.net', '-e', 'AZUREML_RUN_CONFIGURATION=azureml-setup/mutated_run_configuration.json', '-e', 'AZUREML_INSTRUMENTATION_KEY=fb7e27a4-f865-4147-83ee-ffbf79d1a9f5', '-e', 'AZUREML_DRIVERLOG_PATH=azureml-logs/driver_log.txt', '-e', 'TELEMETRY_LOGS=azureml-logs/telemetry_logs/', '-e', 'FAIRLEARN_LOGS=azureml-logs/telemetry_logs/fairlearn_log.txt', '-e', 'INTERPRET_TEXT_LOGS=azureml-logs/telemetry_logs/interpret_text_log.txt', '-e', 'INTERPRET_C_LOGS=azureml-logs/telemetry_logs/interpret_community_log.txt', '-e', 'AZUREML_JOBRELEASELOG_PATH=azureml-logs/job_release_log.txt', '-e', 'AZUREML_JOBPREPLOG_PATH=azureml-logs/job_prep_log.txt', '-e', 'AZUREML_CONTROLLOG_PATH=azureml-logs/control_log.txt', '-e', 'AZUREML_LOGDIRECTORY_PATH=azureml-logs/', '-e', 'AZUREML_PIDFILE_PATH=azureml-setup/pid.txt', '-e', 'AZUREML_RUN_ID=local-debug_1606399855892', '-p=62118:62118', 'azureml/azureml_3be1690457abbdd94ede26f3f6067ec7', '/bin/bash', '-c', 'cd /azureml-run && "/azureml-envs/azureml_c234ac35ebb3e0a412f5b5405977d406/bin/python" "azureml-setup/run_script.py" "/azureml-envs/azureml_c234ac35ebb3e0a412f5b5405977d406/bin/python" "azureml-setup/context_manager_injector.py" "-i" "ProjectPythonPath:context_managers.ProjectPythonPath" "-i" "RunHistory:context_managers.RunHistory" "-i" "TrackUserError:context_managers.TrackUserError" "-i" "UserExceptions:context_managers.UserExceptions" "azureml_ext_debug_wrapper.py"'] docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

MonikaReddy-MSFT commented 3 years ago

@michalmar - Thanks for bringing this to our attention. I'm going to assign this to the document author so they can update the document accordingly.

luqmana commented 3 years ago

I think you assigned me mistakenly. I think you wanted @luisquintanilla

luisquintanilla commented 3 years ago

Thanks for bringing this to our attention @michalmar and sharing your logs. Thoughts @sevillal @SiddhanthUnnithan

luisquintanilla commented 3 years ago

@michalmar would you be able to share your "local-rc3" run configuration? Did you use a curated environment or create your own?

PeterCLu commented 3 years ago

Hi @michalmar , since we haven't heard back from you in a week. I'll proceed to #please-close this issue. If you feel that your documentation issue hasn't been resolved, feel free to open a new issue with additional details.

For product help, your best bet is to create a support ticket:

  1. Navigate to the Azure portal.
  2. Select the support question mark (?).
  3. Select Help + support
  4. Select Create a support

image

Thanks so much!