StartLoaderJobCommand throwing InternalFailureException

mohfpge commented 1 month ago

Checkboxes for prior research

[X] I've gone through Developer Guide and API reference
[X] I've checked AWS Forums and StackOverflow.
[X] I've searched for previous similar issues and didn't find any solution.

Describe the bug

StartLoaderJobCommand returns InternalFailureException

SDK version number

@aws-sdk/client-neptunedata@3.616.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

20.12.0

Reproduction Steps

const client = new NeptunedataClient({
    endpoint: `https://${process.env.ENDPOINT}:8182`,
    region: 'us-west-2',
    logger: console,
  });

  let response;

  try {
    const command = new StartLoaderJobCommand(input);
    response = await client.send(command);
  } catch (error) {
    logger.error({ event: 'Error' }, error.toString());
    throw error;
  }

Observed Behavior

@aws-sdk/credential-provider-node - defaultProvider::fromEnv
@aws-sdk/credential-provider-env - fromEnv
@smithy/property-provider -> Unable to find environment variable credentials.
@aws-sdk/credential-provider-node - defaultProvider::fromSSO
@smithy/property-provider -> Skipping SSO provider in default chain (inputs do not include SSO fields).
@aws-sdk/credential-provider-node - defaultProvider::fromIni
@aws-sdk/credential-provider-ini - fromIni
@smithy/property-provider -> Could not resolve credentials using profile: [default] in configuration/credentials file(s).
@aws-sdk/credential-provider-node - defaultProvider::fromProcess
@aws-sdk/credential-provider-process - fromProcess
@smithy/property-provider -> Profile default could not be found in shared credentials file.
@aws-sdk/credential-provider-node - defaultProvider::fromTokenFile
@aws-sdk/credential-provider-web-identity - fromTokenFile
@smithy/property-provider -> Web identity configuration not specified
@aws-sdk/credential-provider-node - defaultProvider::remoteProvider
@aws-sdk/credential-provider-node - remoteProvider::fromHttp/fromContainerMetadata
@aws-sdk/credential-provider-http - fromHttp
endpoints Initial EndpointParams: {
  "UseFIPS": false,
  "Endpoint": "https://ENDPOINT.us-west-2.neptune.amazonaws.com:8182/",
  "Region": "us-west-2",
  "UseDualStack": false
}
endpoints evaluateCondition: isSet($Endpoint) = true
endpoints evaluateCondition: booleanEquals($UseFIPS, true) = false
endpoints evaluateCondition: booleanEquals($UseDualStack, true) = false
endpoints Resolving endpoint from template: {
  "url": {
    "ref": "Endpoint"
  },
  "properties": {},
  "headers": {}
}
endpoints Resolved endpoint: {
  "headers": {},
  "properties": {},
  "url": "https://ENDPOINT.us-west-2.neptune.amazonaws.com:8182/"
}
{
  clientName: 'NeptunedataClient',
  commandName: 'StartLoaderJobCommand',
  input: {
    source: 's3://BUCKET/vertices',
    format: 'csv',
    s3BucketRegion: 'us-west-2',
    iamRoleArn: 'arn:aws:iam::ACCOUNT:role/ROLE',
    mode: 'NEW',
    failOnError: true,
    parallelism: 'HIGH',
    updateSingleCardinalityProperties: true
  },
  error: InternalFailureException: Bulk load-related request failed.
      at de_InternalFailureExceptionRes (/codebuild/output/src1733/src/s3/00/node_modules/@aws-sdk/client-neptunedata/dist-cjs/index.js:2606:21)
      at de_CommandError (/codebuild/output/src1733/src/s3/00/node_modules/@aws-sdk/client-neptunedata/dist-cjs/index.js:2382:19)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async /codebuild/output/src1733/src/s3/00/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
      at async /codebuild/output/src1733/src/s3/00/node_modules/@smithy/core/dist-cjs/index.js:165:18
      at async /codebuild/output/src1733/src/s3/00/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
      at async /codebuild/output/src1733/src/s3/00/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
      at async uploadFilesToNeptune (/codebuild/output/src1733/src/s3/00/FILE.js:44:16)
      at async start (/codebuild/output/src1733/src/s3/00/packages/FILE.js:14:20)
      at async /codebuild/output/src1733/src/s3/00/packages/FILE.js:16:3 {
    '$fault': 'server',
    '$metadata': {
      httpStatusCode: 500,
      requestId: 'f6c870cc-3baa-6de2-c485-7e32d008fc7a',
      extendedRequestId: undefined,
      cfId: undefined,
      attempts: 3,
      totalRetryDelay: 248
    },
    detailedMessage: 'Bulk load-related request failed.',
    requestId: 'f6c870cc-3baa-6de2-c485-7e32d008fc7a',
    code: 'InternalFailureException'
  },
  metadata: {
    httpStatusCode: 500,
    requestId: 'f6c870cc-3baa-6de2-c485-7e32d008fc7a',
    extendedRequestId: undefined,
    cfId: undefined,
    attempts: 3,
    totalRetryDelay: 248
  }
}

Expected Behavior

Expect that the bulk upload starts without error

Possible Solution

No response

Additional Information/Context

I have had a custom process using plain fetch requests that has worked for ~6 years. I was required to update to require IAM signatures via Signature v4 so I rewrote our process to use this library so we know that our Role is correct, VPC is setup, and the permissions on our ECS instance, bucket, etc have been setup properly.

mohfpge commented 1 month ago

Seems like there are possibly 3 issues

failOnError may need to be of type string "TRUE" or "FALSE"
updateSingleCardinalityProperties may need to be of type string "TRUE" or "FALSE"
s3BucketRegion may need to map to region. If I attempt to update this property manually I get the new error below

{
  clientName: 'NeptunedataClient',
  commandName: 'StartLoaderJobCommand',
  input: {
    source: 's3://BUCKET/vertices',
    format: 'csv',
    region: 'us-west-2', // this is missing?
    iamRoleArn: 'arn:aws:iam::NUMBER:role/ROLE',
    mode: 'NEW',
    failOnError: 'TRUE',
    parallelism: 'HIGH',
    updateSingleCardinalityProperties: 'TRUE'
  },
  error: MissingParameterException: Missing required parameters: [region]
      at de_MissingParameterExceptionRes (/codebuild/output/src1620/src/s3/00/node_modules/@aws-sdk/client-neptunedata/dist-cjs/index.js:2726:21)
      at de_CommandError (/codebuild/output/src1620/src/s3/00/node_modules/@aws-sdk/client-neptunedata/dist-cjs/index.js:2361:19)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async /codebuild/output/src1620/src/s3/00/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
      at async /codebuild/output/src1620/src/s3/00/node_modules/@smithy/core/dist-cjs/index.js:165:18
      at async /codebuild/output/src1620/src/s3/00/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
      at async /codebuild/output/src1620/src/s3/00/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
//...

mohfpge commented 1 month ago

Another possibility is that we're using an older Neptune engine version (1.2.1.1) which may not be compatible with the new field s3BucketRegion. I'm comparing these two sources of documentation:

https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-load.html - old

https://docs.aws.amazon.com/neptune/latest/userguide/data-api-dp-loader.html - new

These two differ slightly. I can seem to locate when each article was written or if they are tied to a specific engine version

mohfpge commented 1 month ago

Quick update: we've updated to the latest Neptune engine version and are still experiencing the same issue

zshzbh commented 1 month ago

Hey @mohfpge ,

Thanks for contacting us. I can reproduce some of the errors and this is my reproducing steps:

npm i @aws-sdk/client-neptunedata@3.616.0
write code in index.js file

import {
  NeptunedataClient,
  StartLoaderJobCommand,
} from "@aws-sdk/client-neptunedata";

const client = new NeptunedataClient({
  endpoint: `https://db-neptune-1.cluster-XXXXXXX.us-west-2.neptune.amazonaws.com:8182`,
  region: "us-west-2",
  logger: console,
});

let response;
const input = {
  source: "s3://my-bucket-us-west-2",
  format: "csv",
  region: "us-west-2", 
  iamRoleArn: "arn:aws:iam::NUMBERS:user/MaggieMa",
  mode: "NEW",
  failOnError: "TRUE",
  parallelism: "HIGH",
  updateSingleCardinalityProperties: "TRUE",
};
try {
  const command = new StartLoaderJobCommand(input);
  response = await client.send(command);
  console.log(response);
} catch (error) {
  console.log("error", error);
  //logger.error({ event: 'Error' }, error.toString());
  throw error;
}

run node index.js

This is the result I got :

@aws-sdk/credential-provider-node - defaultProvider::fromEnv
@aws-sdk/credential-provider-env - fromEnv
@smithy/property-provider -> Unable to find environment variable credentials.
@aws-sdk/credential-provider-node - defaultProvider::fromSSO
@smithy/property-provider -> Skipping SSO provider in default chain (inputs do not include SSO fields).
@aws-sdk/credential-provider-node - defaultProvider::fromIni
@aws-sdk/credential-provider-ini - fromIni
@aws-sdk/credential-provider-ini - resolveStaticCredentials
endpoints Initial EndpointParams: {
  "UseFIPS": false,
  "Endpoint": "db-neptune-1.cluster-XXXXXX.us-west-2.neptune.amazonaws.com://8182",
  "Region": "us-west-2",
  "UseDualStack": false
}
endpoints evaluateCondition: isSet($Endpoint) = true
endpoints evaluateCondition: booleanEquals($UseFIPS, true) = false
endpoints evaluateCondition: booleanEquals($UseDualStack, true) = false
endpoints Resolving endpoint from template: {
  "url": {
    "ref": "Endpoint"
  },
  "properties": {},
  "headers": {}
}
endpoints Resolved endpoint: {
  "headers": {},
  "properties": {},
  "url": "db-neptune-1.cluster-XXXXX.us-west-2.neptune.amazonaws.com://8182"
}
{
  clientName: 'NeptunedataClient',
  commandName: 'StartLoaderJobCommand',
  input: {
    source: 's3://my-bucket-us-west-2',
    format: 'csv',
    region: 'us-west-2',
    iamRoleArn: 'arn:aws:iam::NUMBER:user/MaggieMa',
    mode: 'NEW',
    failOnError: 'TRUE',
    parallelism: 'HIGH',
    updateSingleCardinalityProperties: 'TRUE'
  },

Node.js v20.15.1

For this error : @smithy/property-provider -> Unable to find environment variable credentials. It seems that the credentials environment variables are not set in the process(ref)

I referred to this doc and setup the environment variables :

export AWS_ACCESS_KEY_ID=XXXXXXXX(replace by your aws access key)
export AWS_SECRET_ACCESS_KEY=XXXXXX/XXXXXXX/XXXXXXXX(replace by your aws secret access key)
export AWS_DEFAULT_REGION=us-west-2

The error is gone.

I will look into the 3 properties' type/name issues you brought up, and at the meantime, please try this workaround and let us know if the error still exists. 😃

Thanks, Maggie

mohfpge commented 1 month ago

Thank you @zshzbh! We shall take a look and see if this resolves our issue

aclarknexient commented 1 month ago

Unfortunately the Neptune Bulk Load error still happens when the AWS_ variables are set.

I've confirmed that AWS_ variables are being set correctly from the result of aws sts assume-role.

mohfpge commented 1 month ago

To add some context if we remove our dependency on this library and use a simple fetch it works as expected

  const UploadFilesToNeptune =() => {
    const input = JSON.stringify({
      source: `s3://${process.env.PIPELINE_BUCKET}/vertices/`,
      format: 'csv',
      mode: 'NEW',
      iamRoleArn: roleArn,
      region: 'us-west-2',
      failOnError: 'TRUE',
      updateSingleCardinalityProperties: 'TRUE',
      parallelism: 'HIGH',
    });

    return (
      fetch(ENDPOINT, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: input,
      })
        // eslint-disable-next-line consistent-return
        .then(async res => {
          if (res.status === 200) {
            return res.json();
          }
          process.exit(1);
        })
        .then(json => {
          return json
        })
    );
  };

I've cleaned up and shortened the code to make it more concise but it's still functional today

zshzbh commented 1 month ago

Hey @aclarknexient ,

Did you get the same error?

For SDK code - The process is looking for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as the environment keywords.

For the HTTP request that succeeded on your end - accessKey and secret secretKey are deprecated as described in this doc Screenshot 2024-07-26 at 1 46 48 PM The process is not looking for accessKey or secretKey.

I believe there may still be an issue with how your AWS credentials are configured. Even though the value is correct, but the key might not be the correctly set in the environment. To investigate further, could you please open a terminal, navigate to the ~/.aws directory using the command cd ~/.aws, and then display the contents of the credentials file by running cat credentials, and capture a screenshot of the output? When you capture a screenshot of the terminal output, please redact or cover the actual credential values for security purposes. I only need to see the names of the credential keys, not the sensitive values themselves.

Meanwhile, I will also check with SDK JS SDE team on this issue.

Thanks! Maggie

aclarknexient commented 1 month ago

This is happening in a codebuild project and within an EC2 with the same role. Neither of those methods use a ~/.aws directory. We do not have direct access to Neptune from our development laptops.

Both the pipeline and the EC2 instance receive exactly the same error.

The 3 AWS credential variables are set via a script in the EC2 instance:

#!/bin/bash

STS_JSON=$(aws sts assume-role --role-session-name ec2ssm --role-arn arn:aws:iam::REDACTED:role/REDACTED)

AWS_ACCESS_KEY_ID=$(echo "$STS_JSON" | jq -r '.Credentials.AccessKeyId')
AWS_SECRET_ACCESS_KEY=$(echo "$STS_JSON" | jq -r '.Credentials.SecretAccessKey')
AWS_SESSION_TOKEN=$(echo "$STS_JSON" | jq -r '.Credentials.SessionToken')
AWS_DEFAULT_REGION=us-west-2

export AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_DEFAULT_REGION

The Neptune drop database command succeeds, but bulk load that fails. That to me indicates that the credentials are correct. Within the EC2 instance, I am able to run awscurl successfully against the Neptune instance.

mohfpge commented 1 month ago

I don't want this issue to go stale as we're continuing to experience the problem. We've covered that we're not using the deprecated accessKey or secretKey and that AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY have been set in the EC2 instance that runs the StartLoaderJobCommand. Let us know if there's another route we can take to debug this issue. Thanks!

zshzbh commented 1 month ago

Hey @mohfpge @aclarknexient ,

Thanks for the response.

JS SDK does not support AWS_DEFAULT_REGION across all credential providers. You might want to try to use AWS_REGION env variable.

According to your response here May I know why you are using the CLI to assume role rather than let the SDK resolve credentials using the IMDS provider. Its not clear how this is used, but seems like you are running this command when the instance is being initialized, which means it will not refresh / cache credentials.

Thanks! Maggie

github-actions[bot] commented 1 month ago

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

aws / aws-sdk-js-v3