eclipse / hawkbit

Eclipse hawkBit™
https://projects.eclipse.org/projects/iot.hawkbit
Eclipse Public License 2.0
452 stars 187 forks source link

Rollout status stuck in state "creating" #1770

Closed matkomilovich closed 1 month ago

matkomilovich commented 1 month ago

To give some context to the problem, I'm developing a UI in Angular that implements the Management API. Since I'm still in the development stage, for testing purposes, I'm running a local instance of hawkBit connected to a database with MariaDB.

Currently, I'm working on implementing the API for creating rollouts; however, I'm having some issues with managing their states. Every time I create a rollout, it never progresses beyond the "creating" status.

Not sure if this is the source of the problem, but I have been reviewing the attributes of the rollouts I've created so far, and something that stands out to me is that when I check the total_targets argument, it shows that there are a certain number of targets associated to it that match the target filter query and distribution set used in the request. However, when I review the content of the groups formed in that rollout, the number of targets in them is always 0 (checking the total_targets argument).

If anyone has any ideas about what might be happening or if you need more information about my current setup, please do not hesitate to comment.

Note: All targets were created from the UI (Management API).

avgustinmm commented 1 month ago

Hi, could you give some details as: how do you start hawkbit (with update server or with set of micro-services), what is your target query, is it dynamic rollout or not, are you creating a rollout with group amount or with custom group definitions, how many devices do you expect to be involved in the rollout? You could also check the state of your rollout in the database. And, finally, you could debug what happens in while creating the rollout (using a debugger) - the process of creating is mad in JpaRolloutExecutor handleCreateRollout.

matkomilovich commented 1 month ago

hi @avgustinmm, sure!

  1. How do you start hawkbit? -> I'm compiling the project with mvn clean install -Dmaven.test.skip=true, and then running the update server with java -jar ./hawkbit-update-server-0-SNAPSHOT.jar.
  2. What is your target query? -> Right now, I'm using the target filter query "name==*Rollout*" (I have 3 targets named "Rollout_test*").
  3. Is it dynamic rollout or not?-> I'm currently setting the parameter "dynamic" to false, but I tested setting it to true as well (had the same result).
  4. Are you creating a rollout with group amount or with custom group definitions? -> I'm using group amount (amountGroups parameter). Usually I define 1 group.
  5. How many devices do you expect to be involved in the rollout? Based on the target filter and the DS that I'm using, I expect to have 3 targets in the rollout.
  6. You could also check the state of your rollout in the database -> the status of the rollouts that are stuck in "creating" status is 0 (not sure if relevant, but I checked it using DBeaver).

Btw, when I check the content of the rollout I get the following:

{
    "createdAt": 1721145035769,
    "lastModifiedAt": 1721145035769,
    "name": "dynamicFalseRollout",
    "description": "asd",
    "targetFilterQuery": "name==*Rollout*",
    "distributionSetId": 3,
    "status": "creating",
    "totalTargets": 3,
    "totalTargetsPerStatus": {
        "running": 0,
        "notstarted": 3,
        "scheduled": 0,
        "cancelled": 0,
        "finished": 0,
        "error": 0
    },
    "totalGroups": 1,
    "startAt": 1682408570791,
    "forcetime": 0,
    "deleted": false,
    "type": "forced",
    "weight": 1000,
    "dynamic": false,
    "_links": {
        "start": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/start"
        },
        "pause": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/pause"
        },
        "resume": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/resume"
        },
        "triggerNextGroup": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/triggerNextGroup"
        },
        "approve": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/approve"
        },
        "deny": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/deny"
        },
        "groups": {
            "href": "http://localhost:8081/rest/v1/rollouts/13/deploygroups?offset=0&limit=50"
        },
        "distributionset": {
            "href": "http://localhost:8081/rest/v1/distributionsets/3",
            "name": "OsDS:1.0.1"
        },
        "self": {
            "href": "http://localhost:8081/rest/v1/rollouts/13"
        }
    },
    "id": 13
}

and, if I check the content of the deploygroups that was created for the rollout:

{
    "content": [
        {
            "createdAt": 1721145035770,
            "lastModifiedAt": 1721145035770,
            "name": "group-1",
            "description": "group-1",
            "successCondition": {
                "condition": "THRESHOLD",
                "expression": "50"
            },
            "successAction": {
                "action": "NEXTGROUP",
                "expression": ""
            },
            "errorCondition": {
                "condition": "THRESHOLD",
                "expression": "80"
            },
            "errorAction": {
                "action": "PAUSE",
                "expression": ""
            },
            "targetFilterQuery": "",
            "targetPercentage": 100.0,
            "status": "creating",
            "totalTargets": 0,
            "_links": {
                "self": {
                    "href": "http://localhost:8081/rest/v1/rollouts/13/deploygroups/35"
                }
            },
            "id": 35
        }
    ],
    "total": 1,
    "size": 1
}
matkomilovich commented 1 month ago

@avgustinmm Also I followed your recommendation and set the logging level to TRACE in hawkBit while creating a rollout ( full log of the test -> rollout-creation-log.txt). Here are some snippets that caught my attention:

java.lang.NullPointerException: null at java.base/java.util.Objects.requireNonNull(Objects.java:209) at org.eclipse.hawkbit.security.SecurityContextTenantAware.runAsTenantAsUser(SecurityContextTenantAware.java:117) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$4(JpaRolloutHandler.java:108) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:198) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$5(JpaRolloutHandler.java:102) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:196) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$7(JpaRolloutHandler.java:100)

2024-07-16T12:04:45.216-04:00 DEBUG 287477 --- [executor-pool-1] o.mariadb.jdbc.message.server.OkPacket : System variable change: autocommit = ON 2024-07-16T12:04:45.216-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Closing JPA EntityManager [org.eclipse.persistence.internal.jpa.EntityManagerImpl@725e631f] after transaction 2024-07-16T12:04:45.216-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Resuming suspended transaction after completion of inner transaction 2024-07-16T12:04:45.216-04:00 TRACE 287477 --- [executor-pool-1] o.e.h.repository.jpa.JpaRolloutHandler : Unlock lock java.util.concurrent.locks.ReentrantLock@2ef603d7[Locked by thread central-scheduled-executor-pool-1] 2024-07-16T12:04:45.217-04:00 DEBUG 287477 --- [executor-pool-1] o.e.h.r.j.m.JpaSystemManagement : Exception on forEachTenant execution for tenant DEFAULT. Continue with next tenant.

java.lang.NullPointerException: null at java.base/java.util.Objects.requireNonNull(Objects.java:209) at org.eclipse.hawkbit.security.SecurityContextTenantAware.runAsTenantAsUser(SecurityContextTenantAware.java:117) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$4(JpaRolloutHandler.java:108) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:198) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$5(JpaRolloutHandler.java:102) at java.base/java.util.Optional.ifPresentOrElse(Optional.java:196) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleRolloutInNewTransaction$7(JpaRolloutHandler.java:100) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) at org.eclipse.hawkbit.repository.jpa.utils.DeploymentHelper.runInNewTransaction(DeploymentHelper.java:122) at org.eclipse.hawkbit.repository.jpa.utils.DeploymentHelper.runInNewTransaction(DeploymentHelper.java:98) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.handleRolloutInNewTransaction(JpaRolloutHandler.java:99) at org.eclipse.hawkbit.repository.jpa.JpaRolloutHandler.lambda$handleAll$0(JpaRolloutHandler.java:83)

- Also, I'm not sure if it's normal, as it's the first time I'm checking the logs of the rollouts creation, but I found some filterTarget = 'null', and filter = 'nul':

2024-07-16T12:04:45.222-04:00 TRACE 287477 --- [executor-pool-1] s.a.i.a.AspectJMethodSecurityInterceptor : Authorizing ReflectiveMethodInvocation: public java.util.List org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement.findActiveRollouts(); target is of class [org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement] with attributes [[authorize: 'hasAuthority('READ_ROLLOUT') or hasAuthority('ROLE_SYSTEM_CODE')', filter: 'null', filterTarget: 'null']] 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] s.a.i.a.AspectJMethodSecurityInterceptor : Authorized ReflectiveMethodInvocation: public java.util.List org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement.findActiveRollouts(); target is of class [org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement] with attributes [[authorize: 'hasAuthority('READ_ROLLOUT') or hasAuthority('ROLE_SYSTEM_CODE')', filter: 'null', filterTarget: 'null']] 2024-07-16T12:04:45.223-04:00 TRACE 287477 --- [executor-pool-1] s.a.i.a.AspectJMethodSecurityInterceptor : Did not switch RunAs authentication since RunAsManager returned null 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Creating new transaction with name [org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement.findActiveRollouts]: PROPAGATION_REQUIRED,ISOLATION_DEFAULT,readOnly 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Opened new EntityManager [org.eclipse.persistence.internal.jpa.EntityManagerImpl@725a8974] for JPA transaction 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Exposing JPA transaction as JDBC [org.springframework.orm.jpa.vendor.EclipseLinkJpaDialect$EclipseLinkConnectionHandle@534368a9] 2024-07-16T12:04:45.223-04:00 TRACE 287477 --- [executor-pool-1] o.s.t.i.TransactionInterceptor : Getting transaction for [org.eclipse.hawkbit.repository.jpa.management.JpaRolloutManagement.findActiveRollouts] 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Found thread-bound EntityManager [org.eclipse.persistence.internal.jpa.EntityManagerImpl@725a8974] for JPA transaction 2024-07-16T12:04:45.223-04:00 DEBUG 287477 --- [executor-pool-1] h.r.j.c.MultiTenantJpaTransactionManager : Participating in existing transaction 2024-07-16T12:04:45.223-04:00 TRACE 287477 --- [executor-pool-1] o.s.t.i.TransactionInterceptor : Getting transaction for [org.eclipse.hawkbit.repository.jpa.repository.HawkBitBaseRepository.findByStatusIn] 2024-07-16T12:04:45.224-04:00 DEBUG 287477 --- [executor-pool-1] o.m.jdbc.client.impl.StandardClient : execute query: SELECT id FROM sp_rollout WHERE ((status IN (?,?,?,?,?,?)) AND (tenant = ?)) 2024-07-16T12:04:45.224-04:00 TRACE 287477 --- [executor-pool-1] o.m.j.client.socket.impl.PacketWriter : send: conn=52 (M)

avgustinmm commented 1 month ago

Hi @matkomilovich, thanks for the detailed info. The the NPE is thrown after calling runAsTenantAsUser:

contextAware.runAsTenantAsUser(
        contextAware.getCurrentTenant(),
        rollout.getCreatedBy(), () -> {
            rolloutExecutor.execute(rollout);
            return null;
        })

and here rollout.getCreatedBy() seems to be null. That's why it could not execute "as user". There shall be a "created by" set to the user that has created (via REST Management API) the rollout. I see in your rollout json you have createdAt but not createdBy. You also miss lastModifiedBy. You create the rollout via REST API with authenticated user?

matkomilovich commented 1 month ago

Now that you mention it, it makes a lot of sense, thanks a lot for your comment @avgustinmm! And yes, I'm creating the rollout via the rest API, but I made some local modifications to support using AWS Cognito for authenticating the user.

Btw, I confirmed the username being null issue, by testing getting the basic user info using the Management API:

{
    "username": null,
    "tenant": "DEFAULT"
}

Do you know which attribute in the IdToken is used to configure the username?

Note: I'll be making tests to fix the username issue, and post an update to let you know if it worked.

avgustinmm commented 1 month ago

The created / updated by is got from SpringSecurityAuditorAware.getCurrentAuditor As you can see in its implementation gets it (in oauth case) with:

((OidcUser) authentication.getPrincipal()).getPreferredUsername()

As far as I see (you should check) the authentication should be OAuth2AuthenticationToken and the principal should be DefaultOidcUser. So it should use StandardClaimNames.PREFERRED_USERNAME claim as username.