awslabs / amazon-neptune-tools

Tools and utilities to enable loading data and building graph applications with Amazon Neptune.
Apache License 2.0
297 stars 151 forks source link

neptune-python-utils: not able to authentificate with IAM DB Authentication #150

Closed imercier closed 1 year ago

imercier commented 3 years ago

Hi, I'm using neptune_python_utils.gremlin_utils, Release 1.0.0 (#145)

I'm using a role to have permission ("neptune-db:*") to access my neptune cluster. running code from readme:

from neptune_python_utils.gremlin_utils import GremlinUtils
GremlinUtils.init_statics(globals())
gremlin_utils = GremlinUtils()
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)
print(g.V().limit(10).valueMap().toList())
conn.close()

I get: Exception: Failed to connect to server: HTTP Error code 403 - Forbidden Endpoint is given by env var: $NEPTUNE_CLUSTER_ENDPOINT $NEPTUNE_CLUSTER_PORT

Any help would be appreciated. Thanks

iansrobinson commented 3 years ago

Hi @imercier. Thanks for reporting this.

Please can you tell me a little more about how you are using neptune-python-utils: is it in a Lambda function, an EC2 instance or a Jupyter notebook in the same VPC as Neptune, for example?

And how are you supplying the role: via an EC2 instance profile, or a Lambda execution role, for example? neptune-python-utils automatically uses the credential provider chain to connect to the database if IAM DB Auth is enabled on the database: hence it would appear that the permission you've specified is not reaching the library though the provider chain.

imercier commented 3 years ago

Hi @iansrobinson, I'm running this python from my laptop with ssh tunneling with an ec2 in neptune's vpc. This is not a network problem, juste authentification one I think. I'm providing role with export AWS_PROFILE=myprofile

I've tried also with:

import os
import boto3
from botocore.credentials import Credentials
from neptune_python_utils.gremlin_utils import GremlinUtils
from neptune_python_utils.endpoints import Endpoints

sts = boto3.client('sts', region_name='eu-west-1')
session = boto3.Session()
sts_connection = session.client('sts')
assume_role_object = sts_connection.assume_role(
    RoleArn=os.environ['ROLEARN'], RoleSessionName='foo')
credentials = assume_role_object['Credentials']
#credentials = session.get_credentials()
s3Session = boto3.Session(aws_access_key_id=credentials['AccessKeyId'],
                          aws_secret_access_key=credentials['SecretAccessKey'],
                          aws_session_token=credentials['SessionToken'])
endpoints = Endpoints(credentials=credentials,
                      neptune_endpoint=os.environ['NEPTUNEHOST'],
                      neptune_port=os.environ['NEPTUNEPORT'])
gremlin_utils = GremlinUtils(endpoints)
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)
print(g.V().limit(10).valueMap().toList())
conn.close()
iansrobinson commented 3 years ago

Thanks for the details @imercier. I'll try and reproduce this setup.

A couple of things to check:

imercier commented 3 years ago

Yes I'm using this setup without iam db authentification during severals month, without any problem. I will let you know for arn in iam role, the problem could be that

iansrobinson commented 3 years ago

I've been able to reproduce this, and identify a fix.

When using a tunnel or load balancer, we need to sign using the Neptune host name and also add this host name as a 'Host' header, but issue a request to the proxy/bastion host. These things aren't happening with the current setup.

This is an opportunity to replace the custom signing code with signing code from botocore. I'm going to spend a little longer testing this for all the IAM scenarios (such as bulk loading), and then I'll create a PR.

imercier commented 3 years ago

What a fast debug you did :-) Keep me in touch, I can test your branch if you want. thanks a lot

imercier commented 3 years ago

After fixing iam role and running this code with env var:

from neptune_python_utils.gremlin_utils import GremlinUtils
GremlinUtils.init_statics(globals())
gremlin_utils = GremlinUtils()
conn = gremlin_utils.remote_connection()
g = gremlin_utils.traversal_source(connection=conn)
print(g.V().limit(10).valueMap().toList())
conn.close()

I got the following error, with an empty db with authentification:

Traceback (most recent call last):
  File "example.py", line 6, in <module>
    print(g.V().limit(10).valueMap().toList())
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/process/traversal.py", line 57, in toList
    return list(iter(self))
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/process/traversal.py", line 47, in __next__
    self.traversal_strategies.apply_strategies(self)
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/process/traversal.py", line 548, in apply_strategies
    traversal_strategy.apply(traversal)
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/remote_connection.py", line 63, in apply
    remote_traversal = self.remote_connection.submit(traversal.bytecode)
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/driver_remote_connection.py", line 59, in submit
    result_set = self._client.submit(bytecode, request_options=self._extract_request_options(bytecode))
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/client.py", line 123, in submit
    return self.submitAsync(message, bindings=bindings, request_options=request_options).result()
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/client.py", line 144, in submitAsync
    return conn.write(message)
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/connection.py", line 55, in write
    self.connect()
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/connection.py", line 45, in connect
    self._transport.connect(self._url, self._headers)
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/aiohttp/transport.py", line 77, in connect
    self._loop.run_until_complete(async_connect())
  File "/usr/lib64/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/imercier/.local/lib/python3.8/site-packages/gremlin_python/driver/aiohttp/transport.py", line 67, in async_connect
    self._websocket = await self._client_session.ws_connect(url, **self._aiohttp_kwargs, headers=headers)
  File "/home/imercier/.local/lib/python3.8/site-packages/aiohttp/client.py", line 729, in _ws_connect
    real_headers = CIMultiDict(headers)
  File "/home/imercier/amazon-neptune-tools/neptune-python-utils/neptune_python_utils/endpoints.py", line 48, in items
    return self.lazy_headers().items()
  File "/home/imercier/amazon-neptune-tools/neptune-python-utils/neptune_python_utils/endpoints.py", line 168, in get_headers
    signing_key = self.__get_signature_key(secret_key, datestamp, self.region, service)
  File "/home/imercier/amazon-neptune-tools/neptune-python-utils/neptune_python_utils/endpoints.py", line 205, in __get_signature_key
    kRegion = self.__sign(kDate, regionName)
  File "/home/imercier/amazon-neptune-tools/neptune-python-utils/neptune_python_utils/endpoints.py", line 201, in __sign
    return hmac.new(key, msg.encode('utf-8'), hashlib.sha256).digest()
AttributeError: 'NoneType' object has no attribute 'encode'
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9e47bfd070>

@iansrobinson how do you access neptune db with authentification? with a lambda function? thanks

iansrobinson commented 3 years ago

That looks like the region name returned by boto3.session.Session() is None. You can supply a region_name to an Endpoints instance.

imercier commented 3 years ago

Thanks, everything works on my side with master 8222e578891faef3e21a2d757d1b496cfbef30e2 export AWS_DEFAULT_REGION=$MYREGION tested with ec2 having iam role and from my local machine with export AWS_PROFILE=$MYROLE I close the issue?

iansrobinson commented 3 years ago

Thanks @imercier

I do in fact have changes that will allow querying an IAM DB auth enabled database through a proxy (such as a bastion host or load balancer), but if you're happy with your current situation, I'd rather take more time to test these changes before publishing them.

iansrobinson commented 1 year ago

Closing this issue. Proxy support is now included: https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-python-utils#proxies