JupiterOne-Archives / integrations-2021-07-16

JupiterOne integration development documentation and issue tracking
0 stars 2 forks source link

Duplicate key in `fetch-users` step #42

Closed aiwilliams closed 3 years ago

aiwilliams commented 3 years ago
err.code | DUPLICATE_KEY_DETECTED
errorId | db292f01-fb83-4b5a-bcdb-3f708129cded
integrationJobId | c81b8be7-6e8d-4c35-bb4a-d3f5f63a11bb
time | 2021-07-01T13:16:52.355Z

Error: Duplicate _key detected (_key={74989719-REDACTED})
at DuplicateKeyTracker.registerKey (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/execution/jobState.js:22:19)
at /opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/execution/jobState.js:64:33
at Array.forEach (<anonymous>)
at addEntities (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/execution/jobState.js:63:18)
at Object.addEntity (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/execution/jobState.js:99:37)
at /opt/jupiterone/app/node_modules/@jupiterone/graph-bitbucket/dist/steps/users.js:22:52
at APIClient.iterateUsers (/opt/jupiterone/app/node_modules/@jupiterone/graph-bitbucket/dist/client.js:59:19)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async /opt/jupiterone/app/node_modules/@jupiterone/graph-bitbucket/dist/steps/users.js:20:13
at async iteratee (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/storage/FileSystemGraphObjectStore/indices.js:18:21)

That leads to the fetch-groups step failing with:

Error: Required data not found in job state: 'USER_BY_UUID_MAP'
at Object.fetchGroups [as executionHandler] (/opt/jupiterone/app/node_modules/@jupiterone/graph-bitbucket/dist/steps/groups.js:14:15)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async executeStep (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/execution/dependencyGraph.js:200:17)
at async Object.timeOperation (/opt/jupiterone/app/node_modules/@jupiterone/integration-sdk-runtime/dist/src/metrics/index.js:6:12)
at async run (/opt/jupiterone/app/node_modules/p-queue/dist/index.js:163:29)
ceelias commented 3 years ago

@aiwilliams Took a look at the code and to me it looks like the issue is how users are being added. The users are fetched using workspaces. Looking at this code to me looks like if a user belongs to multiple workspace, they would get added twice with the key of user.uuid. I'm just wondering how this hasn't caused an issue before. src/steps/users.ts:

  await jobState.iterateEntities(
    {
      _type: BITBUCKET_WORKSPACE_ENTITY_TYPE,
    },
    async (workspaceEntity) => {
      if (workspaceEntity.slug) {
        const slug: string = <string>workspaceEntity.slug;
        await apiClient.iterateUsers(slug, async (user) => {
          const convertedUser = createUserEntity(user);
          const userEntity = (await jobState.addEntity(
            createIntegrationEntity({
              entityData: {
                source: user,
                assign: convertedUser,
              },
            }),
          )) as BitbucketUserEntity;
          const workspace: BitbucketWorkspaceEntity = <
            BitbucketWorkspaceEntity
          >workspaceEntity;
          await jobState.addRelationship(
            createWorkspaceHasUserRelationship(workspace, userEntity),
          );
          userByIdMap[user.uuid] = userEntity;
          userIds.push(userEntity._key);
        });
      }
    },
  );

I think this could be solved like this:

  await jobState.iterateEntities(
    {
      _type: BITBUCKET_WORKSPACE_ENTITY_TYPE,
    },
    async (workspaceEntity) => {
      if (workspaceEntity.slug) {
        const slug: string = <string>workspaceEntity.slug;
        await apiClient.iterateUsers(slug, async (user) => {
          const convertedUser = createUserEntity(user);

           // CHECK IF USER WAS IN ANOTHER WORKSPACE
          const userEntity = (await jobState.findEntity(
            convertedUser._key 
            )) as BitbucketUserEntity;

          if( !userEntity ) {
            // Make sure this user doesn't already exist
            const userEntity = (await jobState.addEntity(
              createIntegrationEntity({
                entityData: {
                  source: user,
                  assign: convertedUser,
                },
              }),
            )) as BitbucketUserEntity;
            userByIdMap[user.uuid] = userEntity;
            userIds.push(userEntity._key);
          }

          // END
          const workspace: BitbucketWorkspaceEntity = <
            BitbucketWorkspaceEntity
          >workspaceEntity;
          await jobState.addRelationship(
            createWorkspaceHasUserRelationship(workspace, userEntity),
          );
        });
      }
    },
  );

Looking at /src/steps/groups.ts, I'm wondering what the point of this check is?:

          for (const user of group.members) {
            if (user.uuid) {
              if (userByIdMap[user.uuid]) {
                await jobState.addRelationship(
                  createGroupHasUserRelationship(
                    groupEntity,
                    userByIdMap[user.uuid],
                  ),
                );
              }
            }
          }

It seems like to me its looking to make sure that a user belongs to both the workspace and the group? Is it possible to have a user that is part of a group but not part of the workspace that a group belongs to? I ask because the potential fix above may effect this.

ceelias commented 3 years ago

Confirmed the issue was that some users are in multiple workspace. The job now runs and I can see the log msg we added