Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
526 stars 214 forks source link

Onboarding module fails #1228

Closed denizetkar closed 3 years ago

denizetkar commented 3 years ago

Describe the bug The onboarding module of the cloud microservices fails to function. When I run the kubectl get pods -n azure-iiot-ns command, the output is:

NAME                                              READY   STATUS    RESTARTS   AGE
azure-iiot-edge-jobs-b98fd4bcb-tb2v8              1/1     Running   0          38s
azure-iiot-events-79848fbcbf-9g8sj                1/1     Running   0          38s
azure-iiot-events-processor-75c985f9b7-2xlc8      0/1     Error     1          38s
azure-iiot-gateway-756967d779-bsfmr               1/1     Running   0          38s
azure-iiot-history-8999f6796-glhzn                1/1     Running   0          38s
azure-iiot-onboarding-5db46cb77f-hhhc7            0/1     Error     1          38s
azure-iiot-publisher-55988b778d-b57qv             1/1     Running   0          38s
azure-iiot-registry-55b98d9f6c-j5pl6              1/1     Running   0          38s
azure-iiot-sync-5c4479cd49-dnxjv                  1/1     Running   0          38s
azure-iiot-telemetry-processor-5647594c59-cc6g6   0/1     Error     1          38s
azure-iiot-tunnel-processor-7954d89fb8-k2hph      0/1     Error     1          38s
azure-iiot-twin-7d5b98fd9d-fkpkf                  1/1     Running   0          38s

Also if I take a peek into the logs, kubectl logs kubectl logs azure-iiot-onboarding-5db46cb77f-hhhc7 -n azure-iiot-ns:

[18:09:12 INF Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost] Using Consumer Group: "onboarding" 
[18:09:12 INF Microsoft.Azure.IIoT.AspNetCore.Diagnostics.Default.MetricServerHost] Started prometheus at 9501/metrics 
[18:09:12 INF Microsoft.Azure.IIoT.Messaging.Default.EventBusHost] Event bus host running.
[18:09:13 ERR Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost] Error starting event processor host.
Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
 ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.
   at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.InitializeStoresAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.StartAsync()
   at Microsoft.Azure.EventHubs.Processor.EventProcessorHost.RegisterEventProcessorFactoryAsync(IEventProcessorFactory factory, EventProcessorOptions processorOptions)
   at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 101
[18:09:13 ERR Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService] Failed to start some hosts.
Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
 ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.
   at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 106
   at Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService.StartAsync(CancellationToken cancellationToken) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\HostStarterService.cs:line 79
Unhandled exception. Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
 ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions.
   at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
   at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 106
   at Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService.StartAsync(CancellationToken cancellationToken) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\HostStarterService.cs:line 79
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost host)
   at Microsoft.Azure.IIoT.Services.Processor.Onboarding.Program.Main(String[] args) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\Program.cs:line 47

To Reproduce Steps to reproduce the behavior:

  1. Install docker desktop 3.4.0 with docker version 20.10.7,
  2. Install minikube v1.20.0 and run minikube start,
  3. Deploy Azure IIoT cloud microservices with helm using the documentation,
    • Provision all required Azure services,
    • Perform Azure AAD app registration,
    • Install the helm chart into the minikube cluster.
  4. After some minutes, observe using kubectl that the onboarding pod is failing with errors.

Expected behavior I would expect the onboarding agent pod to be in "Running" state just as all the other pods, instead of "Error" state.

Desktop (please complete the following information):

Additional context Not sure if more is needed. I will provide more context if asked.

hansgschossmann commented 3 years ago

@denizetkar which version of the Industrial-IoT platform are you using?

denizetkar commented 3 years ago

I'm sorry that I failed to mention it already. I'm using helm 0.3.2 version.

denizetkar commented 3 years ago

Do you need more information? Because, there is still a need more information label attached to this issue. Otherwise, I would like it if you remove the label, please @hansgschossmann .

denizetkar commented 3 years ago

I think it might be helpful if I also provide you the values.yaml that I gave to helm as input for the deployment:

deployment:
  ingress:
    annotations:
      nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
      nginx.ingress.kubernetes.io/session-cookie-max-age: "14400"
      kubernetes.io/ingress.class: nginx
      nginx.ingress.kubernetes.io/session-cookie-name: affinity
      nginx.ingress.kubernetes.io/session-cookie-expires: "14400"
      nginx.ingress.kubernetes.io/affinity: cookie
      nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    enabled: true
azure:
  signalR:
    connectionString: <signalr-connection-string>
    serviceMode: Default
  iotHub:
    sharedAccessPolicies:
      iothubowner:
        connectionString: <iothubowner-connection-string>
    eventHub:
      endpoint: <iothub-eventhub-compatible-endpoint>
      consumerGroup:
        onboarding: onboarding
        events: events
        tunnel: tunnel
        telemetry: telemetry
  storageAccount:
    connectionString: <storage-account-connection-string>
  keyVault:
    uri: <keyvault-connection-string>
  tenantId: <tenant-id>
  cosmosDB:
    connectionString: <cosmosdb-connection-string>
  eventHubNamespace:
    sharedAccessPolicies:
      rootManageSharedAccessKey:
        connectionString: <eventhubnamespace-connection-string>
    eventHub:
      name: <eventhub-name>
      consumerGroup:
        telemetryUx: telemetry_ux
        telemetryCdm: telemetry_cdm
  serviceBusNamespace:
    sharedAccessPolicies:
      rootManageSharedAccessKey:
        connectionString: <servicebusnamespace-connection-string>
  auth:
    servicesApp:
      appId: <service-app-id>
      audience: <service-app-audience>
      secret: <service-app-secret>
    clientsApp:
      appId: <client-app-id>
      secret: <client-app-secret>
    required: true

Aside from the fields that I left out for privacy reasons like so <...>, all the other fields are exactly what I used in my values.yaml file.

hansgschossmann commented 3 years ago

@denizetkar that is sufficient information. minikube is not a supported scenario. we are going to look into this with low priority when we have time. if you have any logs which are indicating an error condition, pls add it as comment.

denizetkar commented 3 years ago

Tried the same deployment on a 1 node K8s cluster bootstrapped with kubeadm. Following are the details: Platform Arch: AMD64 OS: Ubuntu 20.04.2 LTS Virtual Machine: yes

Steps taken to set up K8s cluster:

  1. Installed moby engine with sudo apt-get install moby-engine,
  2. Now that I have access to docker and containerd commands and that I have installed a container runtime, I started to execute the official K8s instructions on how to bootstrap HERE:

    containerd_config_path="./config.toml"
    kubeadm_config_path="./kubeadm-config.yaml"
    
    cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf > /dev/null
    br_netfilter
    EOF
    cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf > /dev/null
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    EOF
    sudo sysctl --system
    
    sudo apt-get update
    sudo apt-get install -y apt-transport-https ca-certificates curl
    sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
    echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
    sudo apt-get update
    sudo apt-get install -y kubelet kubeadm kubectl
    sudo apt-mark hold kubelet kubeadm kubectl
    
    cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf > /dev/null
    overlay
    br_netfilter
    EOF
    sudo modprobe overlay
    sudo modprobe br_netfilter
    cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf > /dev/null
    net.bridge.bridge-nf-call-iptables  = 1
    net.ipv4.ip_forward                 = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    EOF
    sudo sysctl --system
    
    sudo mkdir -p /etc/containerd
    # containerd config default | sudo tee /etc/containerd/config.toml
    sudo cp ${containerd_config_path} /etc/containerd/
    sudo systemctl restart containerd
    
    sudo mkdir /etc/docker
    cat <<EOF | sudo tee /etc/docker/daemon.json > /dev/null
    {
      "exec-opts": ["native.cgroupdriver=systemd"],
      "log-driver": "json-file",
      "log-opts": {
        "max-size": "100m",
        "max-file": "3"
      },
      "storage-driver": "overlay2"
    }
    EOF
    sudo systemctl enable docker
    sudo systemctl daemon-reload
    sudo systemctl restart docker
    
    sudo kubeadm init --config ${kubeadm_config_path} --ignore-preflight-errors NumCPU
    mkdir -p $HOME/.kube
    sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -un):$(id -gn) $HOME/.kube/config
    
    kubectl taint nodes --all node-role.kubernetes.io/master-
    kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
    kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml

    Where ./config.toml file is as follows:

    # Use config version 2 to enable new configuration fields.
    # Config file is parsed as version 1 by default.
    # Version 2 uses long plugin names, i.e. "io.containerd.grpc.v1.cri" vs "cri".
    version = 2
    
    # The 'plugins."io.containerd.grpc.v1.cri"' table contains all of the server options.
    [plugins."io.containerd.grpc.v1.cri"]
    
      # disable_tcp_service disables serving CRI on the TCP server.
      # Note that a TCP server is enabled for containerd if TCPAddress is set in section [grpc].
      disable_tcp_service = true
    
      # stream_server_address is the ip address streaming server is listening on.
      stream_server_address = "127.0.0.1"
    
      # stream_server_port is the port streaming server is listening on.
      stream_server_port = "0"
    
      # stream_idle_timeout is the maximum time a streaming connection can be
      # idle before the connection is automatically closed.
      # The string is in the golang duration format, see:
      #   https://golang.org/pkg/time/#ParseDuration
      stream_idle_timeout = "4h"
    
      # enable_selinux indicates to enable the selinux support.
      enable_selinux = false
    
      # selinux_category_range allows the upper bound on the category range to be set.
      # if not specified or set to 0, defaults to 1024 from the selinux package.
      selinux_category_range = 1024
    
      # sandbox_image is the image used by sandbox container.
      sandbox_image = "k8s.gcr.io/pause:3.2"
    
      # stats_collect_period is the period (in seconds) of snapshots stats collection.
      stats_collect_period = 10
    
      # enable_tls_streaming enables the TLS streaming support.
      # It generates a self-sign certificate unless the following x509_key_pair_streaming are both set.
      enable_tls_streaming = false
    
      # tolerate_missing_hugetlb_controller if set to false will error out on create/update
      # container requests with huge page limits if the cgroup controller for hugepages is not present.
      # This helps with supporting Kubernetes <=1.18 out of the box. (default is `true`)
      tolerate_missing_hugetlb_controller = true
    
      # ignore_image_defined_volumes ignores volumes defined by the image. Useful for better resource
        # isolation, security and early detection of issues in the mount configuration when using
        # ReadOnlyRootFilesystem since containers won't silently mount a temporary volume.
      ignore_image_defined_volumes = false
    
      # 'plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming' contains a x509 valid key pair to stream with tls.
      [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
        # tls_cert_file is the filepath to the certificate paired with the "tls_key_file"
        tls_cert_file = ""
    
        # tls_key_file is the filepath to the private key paired with the "tls_cert_file"
        tls_key_file = ""
    
      # max_container_log_line_size is the maximum log line size in bytes for a container.
      # Log line longer than the limit will be split into multiple lines. -1 means no
      # limit.
      max_container_log_line_size = 16384
    
      # disable_cgroup indicates to disable the cgroup support.
      # This is useful when the daemon does not have permission to access cgroup.
      disable_cgroup = false
    
      # disable_apparmor indicates to disable the apparmor support.
      # This is useful when the daemon does not have permission to access apparmor.
      disable_apparmor = false
    
      # restrict_oom_score_adj indicates to limit the lower bound of OOMScoreAdj to
      # the containerd's current OOMScoreAdj.
      # This is useful when the containerd does not have permission to decrease OOMScoreAdj.
      restrict_oom_score_adj = false
    
      # max_concurrent_downloads restricts the number of concurrent downloads for each image.
      max_concurrent_downloads = 3
    
      # disable_proc_mount disables Kubernetes ProcMount support. This MUST be set to `true`
      # when using containerd with Kubernetes <=1.11.
      disable_proc_mount = false
    
      # unsetSeccompProfile is the profile containerd/cri will use if the provided seccomp profile is
      # unset (`""`) for a container (default is `unconfined`)
      unset_seccomp_profile = ""
    
      # 'plugins."io.containerd.grpc.v1.cri".containerd' contains config related to containerd
      [plugins."io.containerd.grpc.v1.cri".containerd]
    
        # snapshotter is the snapshotter used by containerd.
        snapshotter = "overlayfs"
    
        # no_pivot disables pivot-root (linux only), required when running a container in a RamDisk with runc.
        # This only works for runtime type "io.containerd.runtime.v1.linux".
        no_pivot = false
    
        # disable_snapshot_annotations disables to pass additional annotations (image
        # related information) to snapshotters. These annotations are required by
        # stargz snapshotter (https://github.com/containerd/stargz-snapshotter)
        disable_snapshot_annotations = false
    
        # discard_unpacked_layers allows GC to remove layers from the content store after
        # successfully unpacking these layers to the snapshotter.
        discard_unpacked_layers = false
    
        # default_runtime_name is the default runtime name to use.
        default_runtime_name = "runc"
    
        # 'plugins."io.containerd.grpc.v1.cri".containerd.default_runtime' is the runtime to use in containerd.
        # DEPRECATED: use `default_runtime_name` and `plugins."io.containerd.grpc.v1.cri".runtimes` instead.
        # Remove in containerd 1.4.
        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
    
        # 'plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime' is a runtime to run untrusted workloads on it.
        # DEPRECATED: use `untrusted` runtime in `plugins."io.containerd.grpc.v1.cri".runtimes` instead.
        # Remove in containerd 1.4.
        [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
    
        # 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes' is a map from CRI RuntimeHandler strings, which specify types
        # of runtime configurations, to the matching configurations.
        # In this example, 'runc' is the RuntimeHandler string to match.
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          # runtime_type is the runtime type to use in containerd.
          # The default value is "io.containerd.runc.v2" since containerd 1.4.
          # The default value was "io.containerd.runc.v1" in containerd 1.3, "io.containerd.runtime.v1.linux" in prior releases.
          runtime_type = "io.containerd.runc.v2"
    
          # pod_annotations is a list of pod annotations passed to both pod
          # sandbox as well as container OCI annotations. Pod_annotations also
          # supports golang path match pattern - https://golang.org/pkg/path/#Match.
          # e.g. ["runc.com.*"], ["*.runc.com"], ["runc.com/*"].
          #
          # For the naming convention of annotation keys, please reference:
          # * Kubernetes: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#syntax-and-character-set
          # * OCI: https://github.com/opencontainers/image-spec/blob/master/annotations.md
          pod_annotations = []
    
          # container_annotations is a list of container annotations passed through to the OCI config of the containers.
          # Container annotations in CRI are usually generated by other Kubernetes node components (i.e., not users).
          # Currently, only device plugins populate the annotations.
          container_annotations = []
    
          # privileged_without_host_devices allows overloading the default behaviour of passing host
          # devices through to privileged containers. This is useful when using a runtime where it does
          # not make sense to pass host devices to the container when privileged. Defaults to false -
          # i.e pass host devices through to privileged containers.
          privileged_without_host_devices = false
    
          # base_runtime_spec is a file path to a JSON file with the OCI spec that will be used as the base spec that all
          # container's are created from.
          # Use containerd's `ctr oci spec > /etc/containerd/cri-base.json` to output initial spec file.
          # Spec files are loaded at launch, so containerd daemon must be restared on any changes to refresh default specs.
          # Still running containers and restarted containers will still be using the original spec from which that container was created.
          base_runtime_spec = ""
    
          # 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options' is options specific to
          # "io.containerd.runc.v1" and "io.containerd.runc.v2". Its corresponding options type is:
          #   https://github.com/containerd/containerd/blob/v1.3.2/runtime/v2/runc/options/oci.pb.go#L26 .
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            # NoPivotRoot disables pivot root when creating a container.
            NoPivotRoot = false
    
            # NoNewKeyring disables new keyring for the container.
            NoNewKeyring = false
    
            # ShimCgroup places the shim in a cgroup.
            ShimCgroup = ""
    
            # IoUid sets the I/O's pipes uid.
            IoUid = 0
    
            # IoGid sets the I/O's pipes gid.
            IoGid = 0
    
            # BinaryName is the binary name of the runc binary.
            BinaryName = ""
    
            # Root is the runc root directory.
            Root = ""
    
            # CriuPath is the criu binary path.
            CriuPath = ""
    
            # SystemdCgroup enables systemd cgroups.
            SystemdCgroup = true
    
            # CriuImagePath is the criu image path
            CriuImagePath = ""
    
            # CriuWorkPath is the criu work path.
            CriuWorkPath = ""
    
      # 'plugins."io.containerd.grpc.v1.cri".cni' contains config related to cni
      [plugins."io.containerd.grpc.v1.cri".cni]
        # bin_dir is the directory in which the binaries for the plugin is kept.
        bin_dir = "/opt/cni/bin"
    
        # conf_dir is the directory in which the admin places a CNI conf.
        conf_dir = "/etc/cni/net.d"
    
        # max_conf_num specifies the maximum number of CNI plugin config files to
        # load from the CNI config directory. By default, only 1 CNI plugin config
        # file will be loaded. If you want to load multiple CNI plugin config files
        # set max_conf_num to the number desired. Setting max_config_num to 0 is
        # interpreted as no limit is desired and will result in all CNI plugin
        # config files being loaded from the CNI config directory.
        max_conf_num = 1
    
        # conf_template is the file path of golang template used to generate
        # cni config.
        # If this is set, containerd will generate a cni config file from the
        # template. Otherwise, containerd will wait for the system admin or cni
        # daemon to drop the config file into the conf_dir.
        # This is a temporary backward-compatible solution for kubenet users
        # who don't have a cni daemonset in production yet.
        # This will be deprecated when kubenet is deprecated.
        # See the "CNI Config Template" section for more details.
        conf_template = ""
    
      # 'plugins."io.containerd.grpc.v1.cri".registry' contains config related to the registry
      [plugins."io.containerd.grpc.v1.cri".registry]
    
        # 'plugins."io.containerd.grpc.v1.cri.registry.headers sets the http request headers to send for all registry requests
        [plugins."io.containerd.grpc.v1.cri".registry.headers]
            Foo = ["bar"]
    
        # 'plugins."io.containerd.grpc.v1.cri".registry.mirrors' are namespace to mirror mapping for all namespaces.
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
          [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
            endpoint = ["https://registry-1.docker.io", ]
    
      # 'plugins."io.containerd.grpc.v1.cri".image_decryption' contains config related
      # to handling decryption of encrypted container images.
      [plugins."io.containerd.grpc.v1.cri".image_decryption]
        # key_model defines the name of the key model used for how the cri obtains
        # keys used for decryption of encrypted container images.
        # The [decryption document](https://github.com/containerd/cri/blob/master/docs/decryption.md)
        # contains additional information about the key models available. 
        #
        # Set of available string options: {"", "node"}
        # Omission of this field defaults to the empty string "", which indicates no key model, 
        # disabling image decryption.
        #
        # In order to use the decryption feature, additional configurations must be made.
        # The [decryption document](https://github.com/containerd/cri/blob/master/docs/decryption.md)
        # provides information of how to set up stream processors and the containerd imgcrypt decoder
        # with the appropriate key models.
        #
        # Additional information:
        # * Stream processors: https://github.com/containerd/containerd/blob/master/docs/stream_processors.md
        # * Containerd imgcrypt: https://github.com/containerd/imgcrypt
        key_model = "node"

    And ./kubeadm-config.yaml file is as follows:

    kind: ClusterConfiguration
    apiVersion: kubeadm.k8s.io/v1beta2
    kubernetesVersion: v1.21.1
    networking:
      podSubnet: "192.168.0.0/16"
    ---
    kind: KubeletConfiguration
    apiVersion: kubelet.config.k8s.io/v1beta1
    cgroupDriver: systemd
  3. Now at this point, I have a working cluster with the pod network addon set up as Calico. Then I deployed the Azure IIoT helm chart with version 0.3.2 as before:
    helm install azure-iiot azure-iiot/azure-industrial-iot --namespace azure-iiot-ns --values $ValuesYamlPath
  4. When I check the pod statuses, here is what I see:
    $ kubectl get pods -n azure-iiot-ns
    NAME                                              READY   STATUS             RESTARTS   AGE
    azure-iiot-edge-jobs-7549b9f5df-gnfz9             1/1     Running            0          16m
    azure-iiot-events-58dc8cc4b4-shk6n                1/1     Running            0          16m
    azure-iiot-events-processor-7fc87556db-4pbvs      0/1     CrashLoopBackOff   7          16m
    azure-iiot-gateway-55644fb476-v885w               1/1     Running            0          16m
    azure-iiot-history-5966d9f7b5-2zlns               1/1     Running            0          16m
    azure-iiot-onboarding-74f4c58b5d-ttfd4            0/1     CrashLoopBackOff   7          16m
    azure-iiot-publisher-7bb97fd789-5f2j6             1/1     Running            0          16m
    azure-iiot-registry-599445bb9d-s9qls              1/1     Running            0          16m
    azure-iiot-sync-8d4998c54-nh4s8                   1/1     Running            0          16m
    azure-iiot-telemetry-processor-7ff5564f4b-c8t22   0/1     CrashLoopBackOff   7          16m
    azure-iiot-tunnel-processor-875498d49-75vxc       0/1     CrashLoopBackOff   8          16m
    azure-iiot-twin-7ff57d957f-m6kdt                  1/1     Running            0          16m

    And the onboarding pod has the following logs still:

    kubectl logs azure-iiot-onboarding-74f4c58b5d-ttfd4 -n azure-iiot-ns
    [13:59:30 INF Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost] Using Consumer Group: "onboarding" 
    [13:59:30 INF Microsoft.Azure.IIoT.AspNetCore.Diagnostics.Default.MetricServerHost] Started prometheus at 9501/metrics 
    [13:59:30 INF Microsoft.Azure.IIoT.Messaging.Default.EventBusHost] Event bus host running. 
    [13:59:31 ERR Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost] Error starting event processor host. 
    Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
     ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions. 
       at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
       --- End of inner exception stack trace ---
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.InitializeStoresAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.StartAsync()
       at Microsoft.Azure.EventHubs.Processor.EventProcessorHost.RegisterEventProcessorFactoryAsync(IEventProcessorFactory factory, EventProcessorOptions processorOptions)
       at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 101
    [13:59:31 ERR Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService] Failed to start some hosts. 
    Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
     ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions. 
       at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
       --- End of inner exception stack trace ---
       at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 106
       at Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService.StartAsync(CancellationToken cancellationToken) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\HostStarterService.cs:line 79
    Unhandled exception. Microsoft.Azure.EventHubs.Processor.EventProcessorConfigurationException: Encountered error while fetching the list of EventHub PartitionIds
     ---> Microsoft.Azure.EventHubs.MessagingEntityNotFoundException: The messaging entity 'sb://ihsuprodamres112dednamespace.servicebus.windows.net/iot-hub-materialfluss' could not be found. To know more visit https://aka.ms/sbResourceMgrExceptions. 
       at Microsoft.Azure.EventHubs.Amqp.Management.AmqpServiceClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.EventHubClient.GetRuntimeInformationAsync()
       at Microsoft.Azure.EventHubs.Processor.PartitionManager.GetPartitionIdsAsync()
       --- End of inner exception stack trace ---
       at Microsoft.Azure.IIoT.Hub.Processor.EventHub.EventProcessorHost.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Hub.Processor\src\EventHub\EventProcessorHost.cs:line 106
       at Microsoft.Azure.IIoT.Services.Processor.Onboarding.HostStarterService.StartAsync(CancellationToken cancellationToken) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\HostStarterService.cs:line 79
       at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
       at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
       at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
       at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost host)
       at Microsoft.Azure.IIoT.Services.Processor.Onboarding.Program.Main(String[] args) in D:\a\1\s\services\src\Microsoft.Azure.IIoT.Services.Processor.Onboarding\src\Program.cs:line 47

My kubeadm version is:

kubeadm version: &version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:57:56Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

and my kubectl version is:

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

and finally my kubelet --version is:

Kubernetes v1.21.2

This means that the problem is not related to Minikube but a generic problem where IIoT cloud modules cannot be deployed on any private K8s cluster. Do I have to only use Azure Kubernetes for using this software I wonder? I would appreciate if someone can help me. It is okay if I have to do further troubleshooting steps.

hansgschossmann commented 3 years ago

@denizetkar thanks for the additional information. We are following up internally and will get back to you.

marcschier commented 3 years ago

@denizetkar - thanks for providing the detailed information. Our team has documented how to deploy to AKS, i.e., other clusters have not been tested nor are supported (by our team).

In addition to the documentation you might want to start looking at the code under /deploy folder. The key is that the micro services require Azure services such as event hub etc. to be deployed to work, and these are deployed using the ARM templates there. The configuration needs to be added to the values.yaml file. With the configuration the services will not work as shown in your case. Some documentation related to it can be found here.

denizetkar commented 3 years ago

@marcschier Thank you for the response! If you are referring to this comment of mine, then let me clarify once more that I left the confidential fields out and replaced them with placeholders just to show you how it was structured. It doesn't mean that I didn't deploy or didn't provide the credentials of the running Azure services that are all required by the cloud modules.

As far as I can tell from your comment, you guys don't support deploying IIoT cloud modules on anywhere other than AKS. So, this issue could be closed but I would then kindly ask you to modify your documents to indicate this lack of compatibility and that it is only possible to deploy on AKS.