cannot connect from web to iot core using a custom authorizer

ZeeD commented 2 years ago

Describe the bug

As described on stackoverflow I cannot use the sdk i a web page to connect with a custom authorizer to iot core. Do you have a working example on this scenario?

Expected Behavior

I am able to connect to iot core with a custom authorizer from a web page

Current Behavior

The connection is not correctly established

Reproduction Steps

please have a look a the stackoverflow question linked above

Possible Solution

No response

Additional Information/Context

No response

SDK version used

2.2.12

Environment details (OS name and version, etc.)

browser

TwistedTwigleg commented 2 years ago

Thank you for cutting this ticket! While I have not tried connecting to a custom authorizer with the V1 Javascript SDK, I have written the custom authorizer support for the V2 Javascript SDK and can answer a number of the questions asked in the Stackoverflow question.

first of all, is it possible to connect via mqtt+wss using only a custom auth, without cognito/certificates? keep in mind that I am able to use a cognito identity pool without errors, but I need to remove it.

This should be possible. If you reach a custom authorizer and present the correct data, it should return a policy that authenticates your IoT device to connect. Having certificates should not be required. The idea behind using a custom authorizer is that it allows you to control how devices are authenticated.

is it correct that I just need to set up the customAuthQueryString parameter? my understanding is that this should be used on the web.

There's a couple ways to connect to a custom authorizer. The first way is through the headers that are submitted when you establish a handshake with the IoT core, while the second way is to pass a query string. In both cases, IoT core takes the data presented and routes you to the authorizer with the passed-in name for the given endpoint.

I'm not sure on the details for the Javascript V1 SDK right off, but I know for the Javascript V2 SDK, that using a query string is the only way to connect to a custom authorizer via websockets, so it is likely that you are correct in that the V1 SDK has the same restriction.

what are the values I should set up for the various headers/queryparams? X-Amz-CustomAuthorizer-Name is self explanatory, but I'm not sure about X-Amz-CustomAuthorizer-Signature (it's correct to fill it with the content of my private key?). moreover I'm not sure about the TestAuthorizerToken. Is it the correct key to set up?

The values are as follows:

X-Amz-CustomAuthorizer-Name is the name of the custom authorizer. As you said, pretty straight forward.
X-Amz-CustomAuthorizer-Signature If your custom authorizer uses token validation, this is where you would want to put the data inside your private key file. Then the lambda can read this value and confirm whether or not it is accurate. See this post for an example.
- If your custom authorizer does not use token validation, then this is not required.
TestAuthorizerToken is likely a custom value that they are passing to the lambda in the example. It is not required, but it does show how you can pass additional data to a lambda. This page on the documentation shows what is required to make a connection based on how you are connecting.

I've also tried to run the custom_authorizer_connect of the sdk v2 but it's still not working, and I run out of ideas.

Are you able to connect using the test functionality in the Lambda page?

Something else to check is making sure that your lambda has the permissions it needs to authorize the custom authorizer. You can find the permissions for your lambda by going to the lambda, then clicking the configuration tab, and then on the menu on the left side clicking permissions. On the permissions view if you scroll down you should see the Resource-Based policy section, where you will need to have a policy that allows your Lambda to be invoked by the custom authorizer.

For example. here is a policy I have on one of the lambda's I used for testing:

Statement ID
* permissionToIoTCore
Principal
* iot.amazonaws.com
Effect
* Allow
Action
* lambda:InvokeFunction

Conditions:

{
 "ArnLike": {
  "AWS:SourceArn": "arn:aws:iot:<region>:<endpoint>:authorizer/<authorizer name>"
 }
}

Where <region> is the AWS region that contains your custom authorizer, <endpoint> is your endpoint, and <authorizer name> is the name of the authorizer that invokes this lambda.

Your custom authorizer will also need to be correctly pointed to the Lambda function ARN, which you can set in the IoT core console by going to Security, Custom Authorizers, and then your custom authorizer. Then you should be able to put in the ARN for the lambda function, if you have not already.

If the request is not showing up in Cloudwatch, then it means it has not gotten to the point where it invokes the Lambda function itself - which generally means there is a setup issue and/or the data passed to make the initial connection is incorrect. In other words, it is not an issue with the lambda function itself and is something prior to that point that is causing the issue. If the lambda function itself was causing the issue, then it would report a block/disconnect on the Cloudwatch logs.

If the test functionality on the lambda page does not work and your permissions are correctly setup, please let me know and I'll see if I can reproduce the issue on my end.

ZeeD commented 2 years ago

Thank you for cutting this ticket!

thank YOU for the response!

There's a couple ways to connect to a custom authorizer. The first way is through the headers that are submitted when you establish a handshake with the IoT core, while the second way is to pass a query string. In both cases, IoT core takes the data presented and routes you to the authorizer with the passed-in name for the given endpoint.

I'm not sure on the details for the Javascript V1 SDK right off, but I know for the Javascript V2 SDK, that using a query string is the only way to connect to a custom authorizer via websockets, so it is likely that you are correct in that the V1 SDK has the same restriction.

OK. As first step, I'm trying to not use the token validation, and I can see that the first thing my web page do is a GET to wss:///mqtt?X-Amz-CustomAuthorizer-Name=

Are you able to connect using the test functionality in the Lambda page?

yes. I got the expected answers both from the the cli, with aws iot test-invoke-authorizer (that got the response from the lambda) and also directly in the lambda page in aws console

Something else to check is making sure that your lambda has the permissions it needs to authorize the custom authorizer. You can find the permissions for your lambda by going to the lambda, then clicking the configuration tab, and then on the menu on the left side clicking permissions. On the permissions view if you scroll down you should see the Resource-Based policy section, where you will need to have a policy that allows your Lambda to be invoked by the custom authorizer.

For example. here is a policy I have on one of the lambda's I used for testing:
Statement ID
* permissionToIoTCore
Principal
* iot.amazonaws.com
Effect
* Allow
Action
* lambda:InvokeFunction

Conditions:

{
 "ArnLike": {
  "AWS:SourceArn": "arn:aws:iot:<region>:<endpoint>:authorizer/<authorizer name>"
 }
}

I have almost the same policy. My only doubt is about the <endpoint>. between the <region>: and the :authorizer I have the account-id.

Where <region> is the AWS region that contains your custom authorizer, <endpoint> is your endpoint, and <authorizer name> is the name of the authorizer that invokes this lambda.

I tried to update the policy setting the <enpoint> (by that what you mean? the mqtt "host"?) but the console blocks me saying that Member must satisfy regular expression pattern: arn:(aws[a-zA-Z0-9-]*):([a-zA-Z0-9\-])+:([a-z]{2}((-gov)|(-iso(b?)))?-[a-z]+-\d{1})?:(\d{12})?:(.*)

Your custom authorizer will also need to be correctly pointed to the Lambda function ARN, which you can set in the IoT core console by going to Security, Custom Authorizers, and then your custom authorizer. Then you should be able to put in the ARN for the lambda function, if you have not already.

I double checked and in my authorizer, in "Authorizer function" I see the right lambda.

If the request is not showing up in Cloudwatch, then it means it has not gotten to the point where it invokes the Lambda function itself - which generally means there is a setup issue and/or the data passed to make the initial connection is incorrect. In other words, it is not an issue with the lambda function itself and is something prior to that point that is causing the issue. If the lambda function itself was causing the issue, then it would report a block/disconnect on the Cloudwatch logs.

I suspect the same. My real doubt is that there is "something" (my code that is not able to tell to use the authorizer, some other security blocks....) before the authorizer -> lambda step but I'm not sure what else could be.

(edit) additionally, I rewrote my code to use the aws-iot-device-v2, adapting the demo,

    const config = iot.AwsIotMqttConnectionConfigBuilder.new_default_builder()
        .with_clean_session(true)
        .with_client_id('customauth')
        .with_endpoint('<endpoint>')
        .with_custom_authorizer('<username>', '<authorizer name>', undefined, '<password>')
        .build();

(with the idea that <username> and <password> should be checked by the lambda, correct?)

unfortunately all I get when I try to connect is an event interrupt with error set to -1

TwistedTwigleg commented 2 years ago

OK. As first step, I'm trying to not use the token validation, and I can see that the first thing my web page do is a GET to wss:///mqtt?X-Amz-CustomAuthorizer-Name= ... yes. I got the expected answers both from the the cli, with aws iot test-invoke-authorizer (that got the response from the lambda) and also directly in the lambda page in aws console

Ah okay, cool. Then that means it is sending it out correctly in a formatted websocket URL. Then the issue is likely not on the AWS Lambda side.

I have almost the same policy. My only doubt is about the . between the : and the :authorizer I have the account-id.

That is my bad, it should be your account ID. 😅 It should be "AWS:SourceArn": "arn:aws:iot:<region>:<account ID>:authorizer/<authorizer name>"

I suspect the same. My real doubt is that there is "something" (my code that is not able to tell to use the authorizer, some other security blocks....) before the authorizer -> lambda step but I'm not sure what else could be. (edit) additionally, I rewrote my code to use the aws-iot-device-v2, adapting the demo, ... (with the idea that and should be checked by the lambda, correct?) unfortunately all I get when I try to connect is an event interrupt with error set to -1

In V2, you need to pass null or an empty string ("") if you do not want to send the data. So it should be:

const config = iot.AwsIotMqttConnectionConfigBuilder.new_default_builder()
    .with_clean_session(true)
    .with_client_id('customauth')
    .with_endpoint('<endpoint>')
    .with_custom_authorizer('<username or null>', '<authorizer name>', null, '<password or null>')
    .build();

Though if the Lambda doesn't take the arguments, like the username or password, it should still allow a connect unless the Lambda checks and does not return a connection policy.

Does running the V2 Custom Authorizer sample work if you change the settings to point to your custom authorizer?

TwistedTwigleg commented 2 years ago

Also, something else I thought of: Is your endpoint, your custom authorizer, and your lambda all on the same region? The endpoint should be something like <data>-<region>.amazonaws.com while the custom authorizer and lambda should have an ARN that shows the region (should be something like arn:aws:iot:<iot or lambda>:<region>:<data>)

ZeeD commented 2 years ago

Does running the V2 Custom Authorizer sample work if you change the settings to point to your custom authorizer?

I have setted null for all parameters in with_custom_authorizer but the authorizer name, that I have setted with the name of my custom authorizer. I still got the interrupt event with error=-1 :(

TwistedTwigleg commented 2 years ago

Let me give running the V2 sample a try to make sure it is working for me - just in case something changed with the sample and/or another change modified the behavior of the sample. I think it should still be working, but I find it is always good to make sure your assumptions are correct.

TwistedTwigleg commented 2 years ago

I just ran the Javascript V2 SDK custom authorizer sample and it works. I was able to connect and it successfully subscribed and received the publish it sends.

Some additional things to look at:

Is the name of the custom authorizer exactly the same as what it is inputted in AWS? It needs to have the same capitalization.
- Does your custom authorizer have a space in the name? If so, does removing it both on the AWS console and in the Javascript code help?
- Likewise, if your custom authorizer name has any special characters, I would see if removing them helps.
What browser are you using? It shouldn't make a difference, but I am using Firefox - so I know it works there. If you can, I would potentially try using a different web browser just to see if it makes any difference at all.
Have you tried running the V2 Javascript samples/browser/custom_authorizer_connect sample? I know you have pulled the code from it, but if you have not tried running it directly could you please give it a try? Since I know the code there works, at least for me, that would help determine if it is a code issue or something else.

Edit: Something else I thought of: Do you have HTTP caching on your custom authorizer off and signing disabled? Also, does your custom authorizer say its status is active?

ZeeD commented 2 years ago

Does your custom authorizer have a space in the name? If so, does removing it both on the AWS console and in the Javascript code help?

no, it doesn't. I tried both a 'kebab-case-name' and a 'CamelCase' ones for the authorizer

What browser are you using?

I'm using firefox too

Have you tried running the V2 Javascript samples/browser/custom_authorizer_connect

Connecting custom authorizer...

Connection interrupted: error=-1

Do you have HTTP caching on your custom authorizer off and signing disabled? Also, does your custom authorizer say its status is active?

no caching, no signing and is active

TwistedTwigleg commented 2 years ago

Hmm, strange. It sounds like you have it setup correctly then, but I am not sure why it is not working...

Can you check to see if the Javascript V2 native (NodeJS) sample works with your custom authorizer? This would help determine if the issue is on the browser side or not.

Also - does your endpoint have ats in the name? It should be something like: <data>-ats.iot.<region>.amazonaws.com If not, can you check to see if adding ats between the data and .iot helps?

I apologize for having to ask so many questions - I think we are close to finding a solution 🙂

Edit: Something else I thought of - what version of the SDK are you using? Custom Authorizer support was added in version 1.8.2 on the V2 Javascript SDK - so you will need to use that version otherwise it will not connect correctly.

ZeeD commented 2 years ago

Running the node/custom_authorizer demo I got

$ node dist/index.js --endpoint '<endpoint>' --custom_auth_authorizer_name '<authorizer name>'
Connecting...
Failed to connect: libaws-c-mqtt: AWS_ERROR_MQTT_UNEXPECTED_HANGUP, The connection was closed unexpectedly.
$

and, yes, the endpoint contains -ats.

I have using "aws-iot-device-sdk": "^2.2.12" for the first implementation, then I tried to use "aws-iot-device-sdk-v2": "^1.8.2" in my projects

The two demos (the browser and the node one) instead are from main (I just cloned the v2 git repo a pair of hours ago)

Edit: one thing: in the samples README there is

device connects to the server and then disconnects. This sample is for reference on connecting using a custom authorizer. Your Thing's Policy must provide privileges for this sample to connect. Make sure your policy allows a client ID of test-* to connect or use --client_id <client ID here> to send the client ID your policy supports.

I have a policy, written as the README said, but I'm not sure how it is used in the flow.

Moreover in the command I also see that I should provide --ca_file Do this means that I still need to create a thing, with a certificate to use the custom auth??? my goal is to bypass all and have the logic in the lambda

TwistedTwigleg commented 2 years ago

Okay, cool! AWS_ERROR_MQTT_UNEXPECTED_HANGUP almost always universally means that something in the setup with trying to connect to AWS IoT core is not setup correctly and/or the policy does not let it connect.

On your Lambda, can you try having it returning the following:

return {
        "isAuthenticated":True,
        "principalId": "21",
        "disconnectAfterInSeconds": 86400, 
        "refreshAfterInSeconds": 300, 
        "policyDocuments": [
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": "*",
                    "Effect": "Allow",
                    "Resource": "*"
                }
            ]
        }
        ]
}

This is obviously an extremely permissive policy, so I would not suggest it for any production use or anything other than testing, but it help resolve the issue.

I have a policy, written as the README said, but I'm not sure how it is used in the flow.

It should be able to be safely ignored. It might be that you need to have a policy that allows test-* to connect so if you have a client-id with the same prefix that it knows how to "map" the clientID to an IoT thing, but I am not certain. I will test this to find out.

Moreover in the command I also see that I should provide --ca_file Do this means that I still need to create a thing, with a certificate to use the custom auth??? my goal is to bypass all and have the logic in the lambda

A ca_file is not required - it is just added as part of adding the endpoint argument. It is not used anywhere in the code.

Using the samples from main should work fine. What this almost certainly means is that something is incorrectly setup on the AWS side rather than the code side - as I know both samples work. Now we just need to try and figure out what part of the setup is incorrect and then it should work.

TwistedTwigleg commented 2 years ago

Okay, I just tested with the Javascript V2 browser sample and passed a client ID of THIS_IS_A_FAKE_CLIENT_ID_IGNORE_ME, which I do not have in any policies for nor do I have any wildcard policies enabled, and it was able to connect. So the ClientID and IoT device policy should not be an issue.

ZeeD commented 2 years ago

OK. honestly, I have no idea of what happened, but now the clients (both with the SDK v1 and v2) connect successfully with the endpoint. Maybe someone in the organization I work has changed something on the backend configuration and didn't said anything to me, but sorry for the noise.

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

TwistedTwigleg commented 2 years ago

Awesome, I'm glad it is working now! And no worries on the "noise". I'd much rather have an issue open than have it not working and not realize it. It was good to go through the possibilities, as it may help others in the future. Thanks for sharing that it is fixed! 🙂

aws / aws-iot-device-sdk-js