Closed cforce closed 5 years ago
The issue is also existing on osb 1.4.0. I found out so far that the issue seems to be dependent on the location i deploy the db to. It works for eastus, but not northeu although postgresql is GA there too https://azure.microsoft.com/en-us/global-infrastructure/services/?products=postgresql®ions=non-regional,us-east,us-east-2,us-central,us-north-central,us-south-central,us-west-central,us-west,us-west-2,canada-east,canada-central,europe-north,europe-west
Could it be something with API's OSB is using not supported, firewall or vnet stuff what is maybe normally not part of the plain posgresql deployment on azure (not using OSB). Anyway my AKS, vnet and resource group generally are all located in northeu.
Does the SP HAS to be contributor of the whole subscription?
No, resource group scoped contributor is enough.
The issue is also existing on osb 1.4.0.
You mean the service endpoints error? OSBA v1.4.0 even doesn't support that. Actually I am just confused about the logs you provided why the error was raised by PostgreSQL client -- the provisioning step in OSBA to use PostgreSQL client is definitely after successfully creating the database instance and any other ARM resources. And the client is a common community client without any knowledge about service endpoints. Then, it is possible an Azure service bug.
I found out so far that the issue seems to be dependent on the location i deploy the db to.
This sounds increasing the possibility of service bug. Both regions' GA do not mean they sync roll out releases.
You mean the service endpoints error? OSBA v1.4.0 even doesn't support that. I have tried our first with 1.5.0 and then went done to 1.4.0 as i thought it might be a bug in that version until i stumbled over the fact that it seems to (also) location specific (still testing to proof more) I am just confused about the logs you provided why the error was raised by PostgreSQL client That logs come from the run on 1.5.0. Still confused then? IT definitely nis in the logs and points to posgresql fro the message. What does that feature "Virtual Network Service Endpoints" is about. Look like i can't us the whole plan or is it a combination of params and the plan?
the provisioning step in OSBA to use PostgreSQL client is definitely after successfully creating the database instance and any other ARM resources. And the client is a common community client without any knowledge about service endpoints. I looks like that the client? can't successfully validate that the instance is up in running, so goes to orphaned and then might be taken down by OSB again as part of the auto handling? .. I can't see why and who is deleting the pg instance again from Azure, where it was showed as up and running for some minutes? before it disappears again. No logs in OSB what itself is doing here?
And the client is a common community client without any knowledge about service endpoints. What is the github repo for this client? Are they any bugs reported here regarding that?
No need to divert attention to the client. Neither you nor OSBA set the parameter to create vnet rule. (You can confirm by looking into the ARM deployment template in the resource group before service-catalog calls OSBA to delete it.) And additionally it doesn't make sense the same scenario failed in northeu but succeeded in eastus. If you can still reproduce this, please file a support ticket from Azure Portal.
You are right .. i just was successful to do it on northeu on another subscription with osb 1.4.0 successfully. ..But what else? How can i hunt the issue down?
That’s fine if not reproducible. Maybe it was transient or the fix was just rolled out. You can continue following up on it once you hit it again. Can I close the issue for now?
It is reproducible on the original subscription i need to get it running. I have no idea why i don't see this issue on the other one.
svcat provision myapp --class azure-postgresql-10 --plan general-purpose -ndev --params-json '{"tags":{"microservice":"myapp","env":"dev"},"location": "northeurope","resourceGroup": "mygroup"}'
time="2019-03-01T22:54:30Z" level=error msg="error executing job; not submitting any follow-up tasks" error="error executing provisioning step \"setupDatabase\" for instance \"8b63011d-3c74-11e9-b172-46c8584e03e8\": error executing provisioning step: error starting transaction: pq: Client from Azure Virtual Networks is not allowed to access the server. Please make sure your Virtual Network is correctly configured." job=executeProvisioningStep taskID=8b6e6382-b5be-4f2c-9a7e-abe46c893877
Maybe it has something to with the fact that VNET Service Endpoints is enabled on the subnet of my aks and therefore OSB can't reach the server after created. "Virtual Network service endpoint: A Virtual Network service endpoint is a subnet whose property values include one or more formal Azure service type names. In this article we are interested in the type name of Microsoft.Sql, which refers to the Azure service named SQL Database. When using service endpoints for Azure SQL Database, Outbound to Azure SQL Database Public IPs is required: Network Security Groups (NSGs) must be opened to Azure SQL Database IPs to allow connectivity. You can do this by using NSG Service Tags for Azure SQL Database." https://docs.microsoft.com/en-us/azure/virtual-network/security-overview#service-tags
Also saw that you made changes on vnet setup lately. https://github.com/Azure/open-service-broker-azure/commit/4797e909d6fa9cb5fe609c591aa95382365c55cd
The problem was indeed that OSB (and any other connect from the kube nodes) was blocked because they resource group vnet has virtual service endpoints enabled (especially for label "azure sql servers"), what blocks the traffic. I added the k8's subnet to the PostgreSQL as allowed using the "virtual networks" param what was introduced with OSB 1.5.0 for PostgreSQL lately.
The reason why it did also work without this param on another location is, that the PostgreSQL instance in that case was deployed into another vnet where virtual service endpoints for azure sql servers was not enabled (default)
svcat provision test --class azure-postgresql-10 --plan basic -ndev --params-json '{"cores":1,"storage":10,"backupRetention":7,"location": "northeurope","resourceGroup": "deveuaks","firewallRules": [{"startIPAddress": "0.0.0.0","endIPAddress": "255.255.255.255","name":"AllowAll"}]}' --logtostderr
Is the below reason why i can't provision posgresql and end up with OrphanMitigation before it goes to failed.? Also see https://social.msdn.microsoft.com/Forums/azure/en-US/30a90ddd-0949-42c4-9504-fc7a8756fbe6/postgresql-virtual-net-rule-issue-when-having-basic-tier?forum=AzureDatabaseforPostgreSQL
I even see in the resource group that the deployments runs.
Then it gets ready in azure portal postgres view..
..then OSB says OrphanMitigation and Azure Portal shows deleting it. It seems to make no difference if i take posgresql 9.6, or 10, if plan is database general-purpose or basic
Also found this on the logs
Does the SP HAS to be contributor of the whole subscription? Is that because of the server, the database or the firewall resources being created? What roles exactly do i need to limit the creation of resources to the defined resource group?