Unable Connect do SQS if using a VPC

victorsantosdevops commented 5 years ago

when i try send sqs message from lambda in a VPC, i get timeout. I tryed use the VPC Link, but dont work. { "errorMessage": "2019-03-07T13:45:11.739Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Task timed out after 15.01 seconds" }

SG Outbound ALL Open and NACL too. I already create the VPC Link.

Function Logs [INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Start with Hash: 1111114502ff8532d063b9d988e2406a [INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 msgData: {'msgBody': 'Howdy @ 2019-03-07 13:44:56', 'msgAttributes': {'hash': {'StringValue': '1111114502ff8532d063b9d988e2406a', 'DataType': 'String'}}} 2019-03-07 13:45:11.739 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 ask timed out after 15.01 secondsundefined

If i remove the VPC all work fine... But i need this fuction working inside a VPC anyone help me please T_T

SteveByerly commented 5 years ago

I'm having the same problem. I can access KMS and SSM properly, just not SQS

SteveByerly commented 5 years ago

I finally figured this out.

In order for the routes to work properly, you need to use a specific URL for the api calls as noted in the docs. The SQS metadata hasn't been updated in a long time and so it does not have this updated URL scheme.

The solution was not clear to me originally since the argument for the send_message method uses a URL - which I verified was in the proper format. The URL in question is the one where the API call is sent to - the queue URL is just part of the API call's params.

So the fix is to override the endpoint_url when making your client/resource.

session = boto3.Session()

sqs_client = session.client(
    service_name='sqs',
    endpoint_url='https://sqs.us-east-1.amazonaws.com',
)

sqs_client.send_message(
    QueueUrl='https://sqs.us-east-1.amazonaws.com/...',
    MessageBody=json.dumps('my payload'),
)

JordonPhillips commented 5 years ago

So the reason we use the alternate endpoint style is to support Python 2.6 as it does not support SNI, which is required for the new endpoints. We would need to drop support for python 2.6-2.7.8. Even then it would still be a breaking change because people have whitelists for particular urls, so changing what we use would break them.

One possibility in the short term is to add a configuration setting to switch over to the new endpoints.

SteveByerly commented 5 years ago

That makes sense. I don't necessarily think configuration would be better since the user would still need to know about the configuration options.

A warning in the docs would be a good start, perhaps at the top of the page and each relevant section. I looked at the docs several times for a clue when I was working through this - that would have likely resolved it quickly.

Another idea would be to log warnings if the user is on py2.7.8+, is using a new-style URL for the queue_url, and has not set the endpoint_url.

Thanks for following up!

dt-kylecrayne commented 5 years ago

Any updates or plans for tackling this issue? We're stuck on older versions of boto3 so we can work with SQS inside our VPCs.

michaelwills commented 5 years ago

@SteveByerly thanks much for https://github.com/boto/boto3/issues/1900#issuecomment-471047309

And I think a warning in the docs/logs would be good.

Jon-AtAWS commented 4 years ago

@SteveByerly, you're my hero.

Second that. The docs absolutely do not cover this (seems to apply to sqs only) and I burned 8 hours trying to figure it out.

oleksii-donoha commented 4 years ago

I want to add to the observation, it seems like it's not even consistent across regions. I had same code with same setup working in one region, but failing in another, sending me to investigate networking problems.

Overriding endpoint URL works in both regions, but default sqs_client = boto3.client('sqs') only in one. Real head scratcher imma tell you.

christophevg commented 4 years ago

The proposed solution with the additional endpoint_url doesn't seem to solve the problem in our case. Just to be sure, it is the same hostname as the queue url, without the path, etc? So given QueueUrl: https://sqs.eu-central-1.amazonaws.com/1234567/queue-name the endpoint_url would be https://sqs.eu-central-1.amazonaws.com?

christophevg commented 4 years ago

To avoid confusion a quick follow-up: our problem was related to the lambda not having access rights to the public SQS endpoint. After fixing that, simply using sqs_client = boto3.client('sqs') worked as expected.

marianobrc commented 3 years ago

Any updates on this one? I'm trying to run SQS and celery in AWS with a VPC Endpoint (no NAT gateways). Celery initializes the boto3 client with default parameters, and it's not possible to modify the boto3 client initialization code to set the endpoint_url parameter to the right url. I checked that sending a message directly with boto3 and setting endpoint_url works, but with celery the connection times out cause it tries to connect using the default (legacy) endpoint which is not supported with VPC endpoints. AWS ref: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-sending-messages-from-vpc.html

marianobrc commented 3 years ago

@dt-kylecrayne I'm having the same issue, which boto3 version is working for you with SQS inside your VPCs? Thanks

marianobrc commented 3 years ago

I found the following workaround overriding boto settings in endpoints.json:

Copy .venv/lib/python3.8/site-packages/botocore/data/endpoints.jsonto a known path inside a directory/ (your path may be different depending on where is boto intalled)
Edit the file and replace any reference to "queue.{dnsSuffix}"with "sqs.{region}.{dnsSuffix}". This will modify the endpoint url format.
Also edit "protocols" : [ "http", "https" ]removing "http". SQS VPC endpoints only work through https.
Set the env var AWS_DATA_PATH=/directory/conaining/your/file/ to tell boto to get settings from there first.

I hope this helps someone else until this gets fixed

joseph-wortmann commented 3 years ago

This would be quite simple to fix within botocore. The offending line is 467 in client.py. A simple check for python version or for ssl.HAS_SNI to choose either the sslCommonName or the hostname should do it. Currently this line simply chooses sslCommonName if it exists, and hostname otherwise. For SQS and a couple of other services, the sslCommonName always exists in current botocore.

Until this gets fixed (as I said, should be simple), I've created a microlibrary that implements a variation of the solution that @marianobrc indicated directly above. You can find this here - https://pypi.org/project/awsserviceendpoints/

willronchetti commented 3 years ago

Any updates on a fix for this?

kapilt commented 2 years ago

this also results in mismatch data between the cli and boto api usage, as the cli for some reason knows how to use the correct endpoint (sqs.region) but the boto api usage doesn't and has the legacy region. when querying queue url the service returns it based on the accessed host, so now we have data inconsistencies as well because of this.

❯ aws sqs list-queues
{
    "QueueUrls": [
        "https://sqs.us-east-2.amazonaws.com/123456785098/assetdb-ftest-cvKP",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys_dlq",
        "https://sqs.us-east-2.amazonaws.com/123456785098/local-assetdb",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test2",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test3",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test4",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test5"
    ]
}

❯ python
Python 3.10.0 (default, Oct  5 2021, 06:12:41) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> import pprint
>>> pprint.pprint(boto3.client('sqs').list_queues())
{'QueueUrls': ['https://us-east-2.queue.amazonaws.com/123456785098/assetdb-ftest-cvKP',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys_dlq',
               'https://us-east-2.queue.amazonaws.com/123456785098/local-assetdb',
               'https://us-east-2.queue.amazonaws.com/123456785098/test',
               'https://us-east-2.queue.amazonaws.com/123456785098/test2',
               'https://us-east-2.queue.amazonaws.com/123456785098/test3',
               'https://us-east-2.queue.amazonaws.com/123456785098/test4',
               'https://us-east-2.queue.amazonaws.com/123456785098/test5'],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '989',
                                      'content-type': 'text/xml',
                                      'date': 'Thu, 18 Nov 2021 13:04:54 GMT',
                                      'x-amzn-requestid': '554b37b9-02bd-5e12-ad5a-6da9530bfb45'},
                      'HTTPStatusCode': 200,
                      'RequestId': '554b37b9-02bd-5e12-ad5a-6da9530bfb45',
                      'RetryAttempts': 0}}

it feels like madness to me that the sdk is forcing all its users to work around it.

is there a default sane configuration without having to manually pass in endpoint, ie. how is the awscli doing the right thing?

can we get a environment flag similiar to sts regional endpoints?

AbdulBasitKhaleeq commented 2 years ago

Resolved the issue by putting lambda function in private subnet and allowing internet access using NAT gateway.

VPC -> create private subnets -> create NAT Gateway in public subnet -> attach private subnets to NAT Gateway -> lambda configuration update VPC setting.

session = boto3.Session(region_name="ca-central-1") sqs = session.client(service_name='sqs', endpoint_url='https://sqs.ca-central-1.amazonaws.com')

sejr1996 commented 1 year ago

I have had a lambda function sending messages to a sqs queue configured with a vpc, it has been working normally for several months, but now out of nowhere no messages are sent and the function times out. The Lambda function is in a private subnet.

sejr1996 commented 1 year ago

Change the security group ingress rules to allow all traffic, that works. Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

dfloresxyon commented 10 months ago

Change the security group ingress rules to allow all traffic, that works. Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

Same thing happened to me. Lambda running with the VPC set up, there is a endpoint created so the resources within the VPN can access SQS endpoints. All working fine for years. Suddenly lambdas started to timeout and couldn't resolve SQS endoints. Opened the doors as @sejr1996 mentioned as a last resort and it worked for now.

tim-finnigan commented 5 months ago

This issue has been addressed — you can test by running:

import boto3
session = boto3.Session()
boto3.set_stream_logger('')

sqs_client = session.client(
    service_name='sqs',
    region_name='us-east-1'
)

response = sqs_client.list_queues()
print(response)

And see in the logs that it resolves to the correct:

Endpoint provider result: https://sqs.us-east-1.amazonaws.com

Please update to a newer version of Boto3 for access to the latest functionality. The most recent version is 1.34.125 per the CHANGELOG. And note that Python 3.8+ is required.

SQS endpoints for reference: https://docs.aws.amazon.com/general/latest/gr/sqs-service.html. If you want to use a custom or legacy endpoint you could set the service-specific endpoint AWS_ENDPOINT_URL_SQS to the value you need.

github-actions[bot] commented 5 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

boto / boto3

Unable Connect do SQS if using a VPC #1900