FIWARE / data-space-connector

FIWARE Data Space Connector following DSBA TC recommendations
MIT License
17 stars 2 forks source link

Timeout error when deploying local Minimal Viable Dataspace #18

Closed sermars closed 3 weeks ago

sermars commented 1 month ago

Hello,

We are experiencing a timeout error when deploying the local dataspace. We have tried on different computers and virtual machines, but the error persists. In two computers we have managed to deploy it, although it is not consistent. Most of the time you can't shut down and restart the data space, because this error appears.

We have tried it on a couple of linux operating systems (Ubuntu and Fedora). With docker 27.2.0 and java (jdk) in versions 11.0.24 and 21.0.2. The computers we have tested all have intel i5 or i7 processors and between 16GB and 64GB of RAM.

[ERROR] >>> docker exec k3s-maven-plugin kubectl rollout status deployment provider-apisix-control-plane --namespace=provider --timeout=300s (timeout: PT5M10S)
[ERROR] <<< error: deployment "provider-apisix-control-plane" exceeded its progress deadline
[INFO] statefulset provider/provider-etcd ... ready
[INFO] deployment provider/provider-tm-forum-api-customer-bill-management ... ready
[INFO] statefulset provider/data-service-postgis ... ready
[INFO] deployment provider/provider-tm-forum-api-product-ordering-management ... ready
[INFO] deployment provider/odrl-pap ... ready
[INFO] deployment provider/provider-tm-forum-api-product-catalog ... ready
[INFO] deployment provider/provider-tm-forum-api-resource-function-activation ... ready
[ERROR] >>> docker exec k3s-maven-plugin kubectl rollout status deployment provider-apisix-data-plane --namespace=provider --timeout=300s (timeout: PT5M10S)
[ERROR] <<< error: deployment "provider-apisix-data-plane" exceeded its progress deadline
[INFO] deployment infra/traefik ... ready
[INFO] job provider/tmf-api-registration ... ready
[INFO] statefulset consumer/postgresql ... ready
[INFO] deployment provider/did-helper ... ready
[INFO] statefulset provider/postgresql ... ready
[INFO] deployment provider/provider-tm-forum-api-customer-management ... ready
[INFO] deployment provider/provider-tm-forum-api-product-inventory ... ready
[INFO] statefulset trust-anchor/trust-anchor-mysql ... ready
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for connector 0.0.1:
[INFO] 
[INFO] connector .......................................... FAILURE [01:34 min]
[INFO] it ................................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:34 min
[INFO] Finished at: 2024-09-11T13:12:33+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal io.kokuwa.maven:k3s-maven-plugin:1.3.0:apply (create-namespaces) on project connector: Failed to wait for resources, see previous log -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal io.kokuwa.maven:k3s-maven-plugin:1.3.0:apply (create-namespaces) on project connector: Failed to wait for resources, see previous log
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:103)
    at java.lang.reflect.Method.invoke (Method.java:580)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to wait for resources, see previous log
    at io.kokuwa.maven.k3s.mojo.ApplyMojo.execute (ApplyMojo.java:152)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
    at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:103)
    at java.lang.reflect.Method.invoke (Method.java:580)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
[ERROR] 
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
sermars commented 1 month ago

Hi @jason-fox

I have also tried to deploy the connector in a local Kind cluster and apisix is the only service that is not deployed.

Checking the Apisix logs, in all cases the error is this:

Defaulted container "apisix" out of: apisix, wait-for-etcd (init), prepare-apisix (init)
2024/09/30 10:11:04 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 10:11:04 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 10:11:04 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 10:11:04 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 10:11:04 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 10:11:04 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()
joancipria commented 1 month ago

When deploying the local version of the DSC, the provider-apisix-control-plane log shows the same error:

joan@joan-HP-250-G8-Notebook-PC:~/data-space-connector$ kubectl --kubeconfig=target/k3s.yaml logs provider-apisix-control-plane-6cd6bdbb5-gdcp9 -n provider
Defaulted container "apisix" out of: apisix, wait-for-etcd (init), prepare-apisix (init)
2024/09/30 11:18:24 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 11:18:24 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 11:18:24 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 11:18:24 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 11:18:24 [emerg] 1#1: bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/usr/local/apisix/logs/worker_events.sock failed (98: Address already in use)
2024/09/30 11:18:24 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()
joan@joan-HP-250-G8-Notebook-PC:~/data-space-connector$ kubectl --kubeconfig=target/k3s.yaml get pods --all-namespaces
NAMESPACE      NAME                                                              READY   STATUS             RESTARTS         AGE
consumer       consumer-keycloak-0                                               1/1     Running            11 (8m22s ago)   25d
consumer       did-helper-7bf56b686f-gmp92                                       1/1     Running            11 (6m35s ago)   25d
consumer       postgresql-0                                                      1/1     Running            11 (6m35s ago)   25d
infra          traefik-699bb489cc-jz5tm                                          1/1     Running            11 (6m35s ago)   25d
kube-system    coredns-b5657867f-mz7sq                                           1/1     Running            0                4m15s
kube-system    svclb-provider-apisix-data-plane-c2402009-mj42j                   2/2     Running            22 (6m35s ago)   25d
kube-system    svclb-traefik-loadbalancer-10b2a178-dtx6j                         1/1     Running            11 (6m35s ago)   25d
provider       authentication-mysql-0                                            1/1     Running            11 (6m35s ago)   25d
provider       credentials-config-service-75ff6d8c98-99gxf                       1/1     Running            26 (5m51s ago)   25d
provider       data-service-postgis-0                                            1/1     Running            11 (6m35s ago)   25d
provider       data-service-scorpio-68c89fb7f7-tl2gr                             1/1     Running            11 (6m35s ago)   25d
provider       did-helper-588dbf95bf-wlhnz                                       1/1     Running            11 (6m35s ago)   25d
provider       dsconfig-644f7fdd-szc4h                                           1/1     Running            11 (6m35s ago)   25d
provider       odrl-pap-c4ff4cdc5-ltdnw                                          1/1     Running            11 (6m35s ago)   25d
provider       postgresql-0                                                      1/1     Running            11 (6m35s ago)   25d
provider       provider-apisix-control-plane-6cd6bdbb5-gdcp9                     0/1     CrashLoopBackOff   536 (2m1s ago)   25d
provider       provider-apisix-data-plane-66686b7455-hrwwn                       0/2     Error              275              25d
provider       provider-contract-management-5c5c76c997-s6hfm                     1/1     Running            11 (6m35s ago)   25d
provider       provider-etcd-0                                                   1/1     Running            11 (6m35s ago)   25d
provider       provider-etcd-1                                                   1/1     Running            11 (6m35s ago)   25d
provider       provider-etcd-2                                                   1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-customer-bill-management-998d97ddd-mt2bw    1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-customer-management-586698485f-fg4lp        1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-envoy-847f4d6cc8-6sx77                      1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-party-catalog-797b89d98b-rn56r              1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-product-catalog-8658d8c59d-64x5l            1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-product-inventory-bb5d45c87-754hs           1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-product-ordering-management-6dc78777xfhr7   1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-resource-catalog-5fd9bcc997-6wk76           1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-resource-function-activation-59f9f44vr7fx   1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-resource-inventory-548fdbf5b4-rgz9k         1/1     Running            11 (6m35s ago)   25d
provider       provider-tm-forum-api-service-catalog-66c5c89dfd-smg27            1/1     Running            11 (6m35s ago)   25d
provider       tmf-api-registration-265r6                                        0/1     Error              0                25d
provider       tmf-api-registration-5lljx                                        0/1     Completed          0                25d
provider       tmf-api-registration-88n97                                        0/1     Error              0                25d
provider       tmf-api-registration-nwhlz                                        0/1     Error              0                25d
provider       tmf-api-registration-ttsl9                                        0/1     Error              0                25d
provider       trusted-issuers-list-5d8fb688b7-r9k85                             1/1     Running            26 (6m11s ago)   25d
provider       verifier-95694dc4c-ltn7s                                          1/1     Running            11 (6m35s ago)   25d
trust-anchor   tir-5fb8c4b6d-fjp7j                                               1/1     Running            27 (6m11s ago)   25d
trust-anchor   trust-anchor-mysql-0                                              1/1     Running            11 (6m35s ago)   25d
wistefan commented 3 weeks ago

@sermars @joancipria Hi, just found the issue. This seems to be an OOM error within apisix. The chart sets resource limits by default(see doc https://github.com/bitnami/charts/tree/main/bitnami/apisix) to "nano", which leads to the container getting killed inside the pod and then entering the crashloop. See the available presets here: https://github.com/bitnami/charts/blob/main/bitnami/common/templates/_resources.tpl#L15 As of now, I would recommend to use small in the local environment. For real deployements, I would highly recommend to set specific resource limits, depending on your needs and cluster and not use one of the presets.(its also what bitnami recommends)