Azure Shows Successful Badge on Deployment Failure

lmondigo commented 2 weeks ago

Summary

On August 20, we noticed our pipelines are showing as successful but upon checking the deployment status, it reported error and should have failed the pipeline.

Steps To Reproduce

After replicating issue with similar variables (latest SF CLI version at that time which appears to be 2.56.4, components being deployed, tests being run), the issue no longer seem to persist and cannot be reproduced anymore.

Command used

sf project deploy start -d delta -o TargetOrg -l RunSpecifiedTests -t $(test_classes) -w 120 -c

Expected result

Failed deployments on pipelines should appear failed.

Actual result

Failed deployments showing up as succeeded.

Additional information

False positive deployment:

After replicating and getting the expected behavior:

System Information

Operating System Ubuntu 22.04.4 LTS

JSON

{
  "architecture": "linux-x64",
  "cliVersion": "@salesforce/cli/2.56.4",
  "nodeVersion": "node-v18.20.4",
  "osVersion": "Linux 6.5.0-1025-azure",
  "rootPath": "/usr/local/lib/node_modules/@salesforce/cli",
  "shell": "bash",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.2.0 (core)",
    "@oclif/plugin-commands 4.0.11 (core)",
    "@oclif/plugin-help 6.2.8 (core)",
    "@oclif/plugin-not-found 3.2.16 (core)",
    "@oclif/plugin-plugins 5.4.4 (core)",
    "@oclif/plugin-search 1.2.5 (core)",
    "@oclif/plugin-update 4.5.3 (core)",
    "@oclif/plugin-version 2.2.10 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.11 (core)",
    "@oclif/plugin-which 3.2.10 (core)",
    "@salesforce/cli 2.56.4 (core)",
    "apex 3.4.2 (core)",
    "auth 3.6.48 (core)",
    "data 3.6.1 (core)",
    "deploy-retrieve 3.10.0 (core)",
    "info 3.3.29 (core)",
    "limits 3.3.25 (core)",
    "marketplace 1.2.22 (core)",
    "org 4.4.8 (core)",
    "packaging 2.8.0 (core)",
    "schema 3.3.24 (core)",
    "settings 2.3.13 (core)",
    "sobject 1.4.29 (core)",
    "source 3.5.14 (core)",
    "telemetry 3.6.7 (core)",
    "templates 56.3.12 (core)",
    "trust 3.7.23 (core)",
    "user 3.5.25 (core)",
    "sfdx-git-delta 5.42.1 (user) published 2 days ago (Mon Aug 26 2024)"
  ]
}

github-actions[bot] commented 2 weeks ago

Hello @lmondigo :wave: It looks like you didn't include the full Salesforce CLI version information in your issue. Please provide the output of version --verbose --json for the CLI you're using (sf or sfdx).

A few more things to check:

Make sure you've provided detailed steps to reproduce your issue.
- A repository that clearly demonstrates the bug is ideal.
Make sure you've installed the latest version of Salesforce CLI. (docs)
- Better yet, try the rc or nightly versions. (docs)
Try running the doctor command to diagnose common issues.
Search GitHub for existing related issues.

Thank you!

github-actions[bot] commented 2 weeks ago

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

github-actions[bot] commented 2 weeks ago

Hello @lmondigo :wave: None of the versions of sf you shared match the latest release.

Shared: 2.56.4 Latest: 2.56.7

Update to the latest version of Salesforce CLI (docs) and confirm that you're still seeing your issue. You can also try the rc and nightly releases! (docs)

After updating, share the full output of sf version --verbose --json

cristiand391 commented 2 weeks ago

We set the exit code for project deploy start based on the deployment result status here:

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/commands/project/deploy/start.ts#L263

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/deploy.ts#L225

https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/errorCodes.ts#L14

that happens after the deployment finished being processed in your org and by looking at the Status: Failed line it suggest that was the deploy status:

(this is where Status: Failed is printed, the Failed string comes from the deploy result payload: https://github.com/salesforcecli/plugin-deploy-retrieve/blob/167dc693d2f06bf2a5e0c2a63e384e14d5dc6c65/src/utils/progressBar.ts#L52

I tried a few possible scenarios but couldn't get the determineExitCode function to return 0 on a non-successful deploy (using sf v2.57.7).

does azure logs somewhere the exit code of the command?

lmondigo commented 2 weeks ago

Thanks for checking @cristiand391

I have checked the JSON response from both jobs via sf project deploy report (see below) and both seem to be identical in terms of status, result.status, and result.success.

Would it be right to assume that status is 0 because the job is completed from the org? I'm also curious about which field is being returned to the terminal to be parsed by the script.

Failed Job

{
  "status": 0,
  "result": {
    "checkOnly": false,
    "completedDate": "2024-08-28T04:06:20.000Z",
    "createdBy": "0052P000000K5Mw",
    "createdByName": "Script User",
    "createdDate": "2024-08-28T04:06:15.000Z",
    "details": {
      "componentFailures": [...],
      "componentSuccesses": [...],
      "runTestResult": {...}
    },
    "done": true,
    "id": "0Af9p00000xxx",
    "ignoreWarnings": false,
    "lastModifiedDate": "2024-08-28T04:06:20.000Z",
    "numberComponentErrors": 1,
    "numberComponentsDeployed": 0,
    "numberComponentsTotal": 1,
    "numberTestErrors": 0,
    "numberTestsCompleted": 0,
    "numberTestsTotal": 0,
    "rollbackOnError": true,
    "runTestsEnabled": false,
    "startDate": "2024-08-28T04:06:16.000Z",
    "status": "Failed",
    "success": false,
    "files": [...]
  },
  "warnings": []
}

Failed job that showed success

{
  "status": 0,
  "result": {
    "checkOnly": false,
    "completedDate": "2024-08-20T00:52:42.000Z",
    "createdBy": "0052P000000K5Mw",
    "createdByName": "Script User",
    "createdDate": "2024-08-20T00:52:34.000Z",
    "details": {
      "componentFailures": [...],
      "componentSuccesses": [...],
      "runTestResult": {...}
    },
    "done": true,
    "id": "0Af9p00000xxx",
    "ignoreWarnings": false,
    "lastModifiedDate": "2024-08-20T00:52:42.000Z",
    "numberComponentErrors": 25,
    "numberComponentsDeployed": 0,
    "numberComponentsTotal": 25,
    "numberTestErrors": 0,
    "numberTestsCompleted": 0,
    "numberTestsTotal": 0,
    "rollbackOnError": true,
    "runTestsEnabled": true,
    "startDate": "2024-08-20T00:52:34.000Z",
    "status": "Failed",
    "success": false,
    "files": [...]
  },
  "warnings": []
}

cristiand391 commented 1 week ago

Would it be right to assume that status is 0 because the job is completed from the org? I'm also curious about which field is being returned to the terminal to be parsed by the script.

No, we set the status key in the JSON output of any sf commands to the same exit code from the process:

here we set it to whatever process.exitCode is (if number) or fallback to 0: https://github.com/salesforcecli/sf-plugins-core/blob/c61adc2093e035acac06b71322bac9fac13325bb/src/sfCommand.ts#L364

same with results but fallback to 1: https://github.com/salesforcecli/sf-plugins-core/blob/c61adc2093e035acac06b71322bac9fac13325bb/src/errorHandling.ts#L44

In the 2 JSON results you shared above I see the the org is returning "status": "Failed" (which should make the CLI set exit code = 1).

I'll check our telemetry and see if I can find similar scenarios.

lmondigo commented 1 week ago

@cristiand391, it happened again on one of our pipelines running in an older container which is still using sfdx (sfdx-cli version v7.194.1) commands which leads me to believe that the issue is not related to whether using sf or sfdx and Azure but as to how the API behaves. Are you aware of any API changes from Salesforce that may cause this issue?

cristiand391 commented 4 days ago

it happened again on one of our pipelines running in an older container which is still using sfdx (sfdx-cli version v7.194.1) commands which leads me to believe that the issue is not related to whether using sf or sfdx and Azure but as to how the API behaves. Are you aware of any API changes from Salesforce that may cause this issue?

Nope, but for the CLI to exit with 0 the API should be returning "status": "Succeeded" with a bad deploy...

If you can share the deploy ID of a deploy that failed but made sf exit with code 0 we could look into what the API returned from our side (CLI telemetry isn't enough to link a deploy to a command exec).

lmondigo commented 1 day ago

Here's several deployment IDs that exited 0 and failed in the org: Deployment in Org A - 0Af9p00000KNB7OCAX Deployment in Org A - 0Af9p00000KP7ScCAL Prod validation in Org B - 0AfOZ000000ZPun0AG (after the false positive was thrown, quick deployment failed with an error INVALID_ID_FIELD: Source validate did not run tests in the org even though we triggered the tests)

forcedotcom / cli