cncf / demo

Demo of CNCF technologies
https://cncf.io
Apache License 2.0
77 stars 39 forks source link

Azure support #126

Open namliz opened 7 years ago

hh commented 7 years ago

@zilman do you have same Azure creds I could use? Cheers @hh

namliz commented 7 years ago

@hh don't have credits for Azure on hand but I can try and get us some, meanwhile you get $200 for free at first I believe.

dankohn commented 7 years ago

Gene, can you please give them admin access to the Azure account we setup.

On Tue, Feb 28, 2017 at 10:40 AM, Eugene notifications@github.com wrote:

@hh https://github.com/hh don't have credits for Azure on hand but I can try and get us some. You get $200 for free at first I believe.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cncf/demo/issues/126#issuecomment-283074024, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8MBpHnU1I1-0lDinrqP_BBmhhg78Itks5rhD_tgaJpZM4KXHVt .

-- Dan Kohn mailto:dan@linuxfoundation.org Executive Director, Cloud Native Computing Foundation https://cncf.io/ tel:+1-415-233-1000

namliz commented 7 years ago

@dankohn I can't, only ever used my personal free account (I think), not an admin on the main one - at least that's what it says when I go to the "Active Directory".

hh commented 7 years ago

Working in branch https://github.com/cncf/demo/tree/azure

hh commented 7 years ago

We had to migrate to a pay-as-you-go to get around as the free limits on cores is too low for quick-iterations on dev work. We will refactor for smaller instances later to allow free-tier to work after everything is functioning.

hh commented 7 years ago

We seem to have triggered block on our API access from NZ. (az login and API access seems to hit a firewall somewhere)

We moved to an IP in the states for now.

Is there someone at Microsoft that might be interested in supporting / unblocking us as we move along?

dankohn commented 7 years ago

I'm hopeful @brendandburns might know someone who could send CNCF a few Azure credits.

brendandburns commented 7 years ago

Can you send me the subscription ID (bburns [at] microsoft [dot] com) and I'll see what we can do on this side.

--brendan

hh commented 7 years ago

@brendandburns done and thanks!

hh commented 7 years ago

We started getting kubelet panics when we started using --cloud-provider=azure

Took a while to come across: kubernetes/kubernetes#42576

Now we're off to generate --cloud-config=azure.json

-- Logs begin at Mon 2017-03-20 19:46:01 UTC, end at Mon 2017-03-20 20:18:32 UTC. --                                                                                                                                                                                                                   
Mar 20 19:48:56 etcd-master1 systemd[1]: Starting kubelet.service...                                                                                                                                                                                                                                   
Mar 20 19:48:56 etcd-master1 systemd[1]: Started kubelet.service.                                                                                                                                                                                                                                      
Mar 20 19:48:56 etcd-master1 kubelet-wrapper[2257]: + exec /usr/bin/rkt run --volume dns,kind=host,source=/etc/resolv.conf --mount volume=dns,target=/etc/resolv.conf --volume rkt,kind=host,source=/opt/bin/host-rkt --mount volume=rkt,target=/usr/bin/rkt --volume                                  
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: pubkey: prefix: "quay.io/coreos/hyperkube"                                                                                                                                                                                                         
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: key: "https://quay.io/aci-signing-key"                                                                                                                                                                                                             
...
Mar 20 19:48:59 etcd-master1 kubelet-wrapper[2257]: Downloading signature:  473 B/473 B                                                                                                                                                                                                                
Mar 20 19:49:00 etcd-master1 kubelet-wrapper[2257]: Downloading ACI:  0 B/237 MB                                                                                                                                                                                                                       
...
Mar 20 19:49:07 etcd-master1 kubelet-wrapper[2257]: Downloading ACI:  237 MB/237 MB                                                                                                                                                                                                                    
Mar 20 19:49:37 etcd-master1 kubelet-wrapper[2257]: image: signature verified:                                                                                                                                                                                                                         
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic: runtime error: invalid memory address or nil pointer dereference [recovered]                                                                                                                                                                
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                    
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: [signal 0xb code=0x1 addr=0x20 pc=0xa32559]                                                                                                                                                                                                        
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: goroutine 1 [running]:                                                                                                                                                                                                                             
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic(0x448ae60, 0xc820030060)                                                                                                                                                                                                                     
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/runtime/panic.go:481 +0x3e6                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.readAll.func1(0xc820acca40)                                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:30 +0x11e                                                                                                                                                                                            
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: panic(0x448ae60, 0xc820030060)                                                                                                                                                                                                                     
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/runtime/panic.go:443 +0x4e9                                                                                                                                                                                              
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: bytes.(*Buffer).ReadFrom(0xc820acc998, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                    
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/bytes/buffer.go:176 +0x239                                                                                                                                                                                               
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.readAll(0x0, 0x0, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                        
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:33 +0x156                                                                                                                                                                                            
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: io/ioutil.ReadAll(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                                                               
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /usr/local/go/src/io/ioutil/ioutil.go:42 +0x51                                                                                                                                                                                             
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]: k8s.io/kubernetes/pkg/cloudprovider/providers/azure.NewCloud(0x0, 0x0, 0x0, 0x0, 0x0, 0x0)                                                                                                                                                         
Mar 20 19:50:59 etcd-master1 kubelet-wrapper[2257]:         /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure.go:74 +0x81                                                                                                                  
Mar
namliz commented 7 years ago

Oh boy, on AWS all the stuff it wants in azure.json was simply inferred by Kubernetes (you'd just tag those resources with the cluster name and pass just that to it). This makes it much more convoluted.

This is an unfortunate discrepancy. As far as I know Azure also has the concept of tags.

hh commented 7 years ago

Looks like we need to populate this manually:

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure.go#L38

// Config holds the configuration parsed from the --cloud-config flag
// All fields are required unless otherwise specified
type Config struct {
    // The cloud environment identifier. Takes values from https://github.com/Azure/go-autorest/blob/ec5f4903f77ed9927ac95b19ab8e44ada64c1356/autorest/azure/environments.go#L13
    Cloud string `json:"cloud" yaml:"cloud"`
    // The AAD Tenant ID for the Subscription that the cluster is deployed in
    TenantID string `json:"tenantId" yaml:"tenantId"`
    // The ID of the Azure Subscription that the cluster is deployed in
    SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
    // The name of the resource group that the cluster is deployed in
    ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
    // The location of the resource group that the cluster is deployed in
    Location string `json:"location" yaml:"location"`
    // The name of the VNet that the cluster is deployed in
    VnetName string `json:"vnetName" yaml:"vnetName"`
    // The name of the subnet that the cluster is deployed in
    SubnetName string `json:"subnetName" yaml:"subnetName"`
    // The name of the security group attached to the cluster's subnet
    SecurityGroupName string `json:"securityGroupName" yaml:"securityGroupName"`
    // (Optional in 1.6) The name of the route table attached to the subnet that the cluster is deployed in
    RouteTableName string `json:"routeTableName" yaml:"routeTableName"`
    // (Optional) The name of the availability set that should be used as the load balancer backend
    // If this is set, the Azure cloudprovider will only add nodes from that availability set to the load
    // balancer backend pool. If this is not set, and multiple agent pools (availability sets) are used, then
    // the cloudprovider will try to add all nodes to a single backend pool which is forbidden.
    // In other words, if you use multiple agent pools (availability sets), you MUST set this field.
    PrimaryAvailabilitySetName string `json:"primaryAvailabilitySetName" yaml:"primaryAvailabilitySetName"`

    // The ClientID for an AAD application with RBAC access to talk to Azure RM APIs
    AADClientID string `json:"aadClientId" yaml:"aadClientId"`
    // The ClientSecret for an AAD application with RBAC access to talk to Azure RM APIs
    AADClientSecret string `json:"aadClientSecret" yaml:"aadClientSecret"`
}
hh commented 7 years ago

Yea, would be nice to have this for Azure:

https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/tags.go#L54

    // ClusterID is our cluster identifier: we tag AWS resources with this value,
    // and thus we can run two independent clusters in the same VPC or subnets.
    // This gives us similar functionality to GCE projects.
    ClusterID string
namliz commented 7 years ago

Yes, you have no choice -- and that is not how the other providers are implemented: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L398

Could be an interesting thing to suggest upstream.

Edit: Jinx. :)

I haven't read the azure provider, do you think it would be messy to add this to it?

brendandburns commented 7 years ago

@hh Please see https://github.com/Azure/acs-engine

I think you will find it a much more pleasant way to turn up an Azure kubernetes cluster.

I'll work on the subcription stuff.

hh commented 7 years ago

We have been looking pretty heavily at the individual config generation parts at https://github.com/Azure/acs-engine/tree/master/parts

They've been useful in integrating the acs-engine approach.

hh commented 7 years ago

Just starting multiple build/deploys at once on Azure, had to increase default core quota from 10 to 100. Should be able to do least ten concurrent builds soon.

colemickens commented 7 years ago

RE: Azure relying on hostname == nodeName, this is due to lack of metadata service, so we make assumptions: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_instances.go#L98

hh commented 7 years ago

Great to meet you in Berlin @colemickens !

We really are cutting ourselves on some of the bleeding edges of kubernetes + azure + terraform on this one. :)

Let me know if you want some help in creating those upstream issues for azure / kubernetes we talked about here: https://github.com/cncf/demo/blob/fd21acc7655e849a8cdda9faf0c547fa2916a0dc/azure/readme.org#notable-issues

I opened issues for terraform azurerm_dns_srv_record list and azure-sdk-for-go NetworkInterfaceDnsSettings.dnsServers resolution

We'll look into refactoring this sometime soon after another cloud or two and likely add a second Azure approach using acs-engine provisioning of the kubernetes cluster.

colemickens commented 7 years ago

Finally, I think you might have mentioned another issue where the Azure DNS nameservers weren't forwarding requests? Again, if you can file an Issue against cncf/demo and tag me in it, I will share all three with the right folks internally and get the discussion going.

colemickens commented 7 years ago

@hh, I've got an internal mail drafted, just waiting for cncf/demo issues for the other one (or two) issues listed the previous post. Then I can start getting the right folks to chime in. Thanks!

hh commented 7 years ago

Seeing that private DNS zones are not yet supported, I suspect that's why CNAME record resolution is broken.

hh commented 7 years ago

I've been spinning up AWS and Azure side by side, and I can confirm that the reason we needed a work around for CNAME was due to Azure not yet providing support for private zones

Our work around was to use multiple A records.

We have a working Azure deploy for now, even if we are abusing the public DNS service records. :)

discordianfish commented 7 years ago

@colemickens I just ran into panic-when-optional-config-keys-missing issue. Did you fill an issue for that? Can't find it.

colemickens commented 7 years ago

@discordianfish I just filed: https://github.com/kubernetes/kubernetes/issues/47543