Azure / open-service-broker-azure

The Open Service Broker API Server for Azure Services
https://osba.sh
MIT License
248 stars 100 forks source link

One lifecycle test case for cosmosdb failed abnormally #617

Open norshtein opened 6 years ago

norshtein commented 6 years ago

In latest merged PR for enhancement on storage module, all check in CI passes: https://github.com/Azure/open-service-broker-azure/pull/612. But when this PR merged into master, pipeline was triggered again, one lifecycle test case for cosmosdb failed, though this PR has no relationship with cosmosdb. See https://circleci.com/gh/Azure/open-service-broker-azure/5892?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

norshtein commented 6 years ago

Another case: pipeline in #616 passed but pipeline on master branch failed : https://circleci.com/gh/Azure/open-service-broker-azure/5923?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

norshtein commented 6 years ago

I reproduced this error successfully. The error is caused by exclusive lock conflict. Below is the error message:

 {"code":"PreconditionFailed","message":"There is already an operation in progress which requires exlusive lock on this service 8cd9c40b-02c3-47d0-86d2-141835c04066. Please retry the operation after sometime.\r\nActivityId: 72665989-0961-4815-8f7c-8b7607d131d6, Microsoft.Azure.Documents.Common/2.0.0.0"}

And the call stack of the error:

runtime/debug.Stack(0xc4207b06e8, 0x2, 0x2)
        /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/go-autorest/autorest/azure.(*Future).Done(0xc4207b0ad8, 0xc42380, 0xc4207b34d0, 0x0, 0x0, 0x0)
        /go/src/github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/go-autorest/autorest/azure/async.go:119 +0x675
github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/azure-sdk-for-go/services/cosmos-db/mgmt/2015-04-08/documentdb.DatabaseAccountsClient.DeleteSender(0xc412a0, 0xc42044cd60, 0xc41840, 0xc42037efc0, 0x0, 0x0, 0xdf8475800, 0x1a3185c5000, 0x3, 0x6fc23ac00, ...)
        /go/src/github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/azure-sdk-for-go/services/cosmos-db/mgmt/2015-04-08/documentdb/databaseaccounts.go:279 +0x2d5
github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/azure-sdk-for-go/services/cosmos-db/mgmt/2015-04-08/documentdb.DatabaseAccountsClient.Delete(0xc412a0, 0xc42044cd60, 0xc41840, 0xc42037efc0, 0x0, 0x0, 0xdf8475800, 0x1a3185c5000, 0x3, 0x6fc23ac00, ...)
        /go/src/github.com/Azure/open-service-broker-azure/vendor/github.com/Azure/azure-sdk-for-go/services/cosmos-db/mgmt/2015-04-08/documentdb/databaseaccounts.go:240 +0x67b
github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb.deleteCosmosDBAccount(0xc46880, 0xc4200657c0, 0xc412a0, 0xc42044cd60, 0xc41840, 0xc42037efc0, 0x0, 0x0, 0xdf8475800, 0x1a3185c5000, ...)
        /go/src/github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb/common_deprovision.go:37 +0x1b4
github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb.(*cosmosAccountManager).deleteCosmosDBAccount(0xc420402420, 0xc46880, 0xc420065780, 0x0, 0x0, 0x0, 0x0, 0xbc51b9, 0x24, 0xc4c0c0, ...)
        /go/src/github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb/common_deprovision.go:87 +0xe8
github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb.(*cosmosAccountManager).(github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb.deleteCosmosDBAccount)-fm(0xc46880, 0xc420065780, 0x0, 0x0, 0x0, 0x0, 0xbc51b9, 0x24, 0xc4c0c0, 0xc42054fc20, ...)
        /go/src/github.com/Azure/open-service-broker-azure/pkg/services/cosmosdb/common_deprovision.go:64 +0x79
github.com/Azure/open-service-broker-azure/pkg/service.(*deprovisioningStep).Execute(0xc42044aaa0, 0xc46880, 0xc420065780, 0x0, 0x0, 0x0, 0x0, 0xbc51b9, 0x24, 0xc4c0c0, ...)
        /go/src/github.com/Azure/open-service-broker-azure/pkg/service/deprovisioner.go:67 +0xe5
github.com/Azure/open-service-broker-azure/tests/lifecycle.serviceLifecycleTestCase.execute(0xbaef20, 0x8, 0xbb90a8, 0x14, 0xbc51b9, 0x24, 0xbc5345, 0x24, 0xc420087dd0, 0xc420087e00, ...)
        /go/src/github.com/Azure/open-service-broker-azure/tests/lifecycle/test_case_test.go:315 +0x136e
github.com/Azure/open-service-broker-azure/tests/lifecycle.TestServices.func1.1(0xc4203cc2d0)
        /go/src/github.com/Azure/open-service-broker-azure/tests/lifecycle/driver_test.go:45 +0xe2
testing.tRunner(0xc4203cc2d0, 0xc420392140)
        /usr/local/go/src/testing/testing.go:777 +0xd0
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:824 +0x2e0

I have no idea on why this error happened. I think this might be an issue in internal API. I'll request for help from Azure Cosmos team.

Below is my guess on the error: When running lifecycle test case, steps are executed one by one tightly. There exists the possibility that previous step is done but the exclusive lock is not released, and later step requires the exclusive lock, which will cause the error. If my guess is correct, then a possible temporary fix is sleeping for several seconds before running deprovisioning step.

norshtein commented 6 years ago

IcM ticket submitted: https://icm.ad.msft.net/imp/v3/incidents/details/88035437/home