davebshow / goblin

A Python 3.5 rewrite of the TinkerPop 3 OGM Goblin
Other
93 stars 21 forks source link

Using Goblin with Azure Cosmos DB #84

Open jbrow70 opened 6 years ago

jbrow70 commented 6 years ago

Hi,

Has anyone been able to use Goblin with Cosmos DB?

Here's our yaml config:

hosts: ['_gremlin_uri_that_cosmos_gives_us_in_azuredashboard'] port: 443 username: '/dbs/graphdb/colls/Persons' password: 'somepassword' response_timeout: 5 connectionPool: { enableSsl: true} serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { serializeResultToString: true }}

We run

import asyncio, datetime import goblin from goblin import Goblin from goblin import driver, abc, exception

loop = asyncio.get_event_loop()

app = loop.run_until_complete(Goblin.open(loop, configfile='config.yaml'))

app.close()

but it always comes back with

Traceback (most recent call last): File "/usr/lib/python3.6/asyncio/selector_events.py", line 724, in _read_ready data = self._sock.recv(self.max_size) ConnectionResetError: [Errno 104] Connection reset by peer

We've tried with the Gremlin console and it sort of works better against cosmos db but then we get back the connection reset by peer error there too after awhile. It seems to be a cosmos db/networking issue with Azure, but we're not sure where to look.

We just have a trial edition of azure at this point, just evaluating Goblin with Cosmos DB for graphs.

The Azure documentation for Cosmos DB using Gremlin says to use the Goblin Python driver for Python support, but we can't even connect at the moment :(

Any help would be appreciated.

Thanks,

Jordan

davebshow commented 6 years ago

Hi Jordan,

I have never used Cosmos DB. I can help look into this, but a couple things first:

jbrow70 commented 6 years ago

Hi @davebshow,

Do you know what version of TinkerPop Cosmos DB uses?

Cosmos DB uses version 3 for Tinkerpop. The gremlin console, according to https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-gremlin-console requires version 3.2.5 or above of Tinkerpop so that's how we inferred the version they use for Tinkerpop.

What version of Cosmos DB do you have?

We're using the latest version of Cosmos DB as we launched it a day ago.

What version of Goblin, aiogremlin, and gremlinpython you are using? Versions are a little wacky right now, I am working to get things up to speed.

Goblin 2.0.0 aiogremlin 3.2.4 gremlinpython: As far as we can tell we're not using gremlinpython nor is Goblin or aiogremlin (we don't see it as a dependency when we installed Goblin). However, we did just install gremlinpython (3.3.0) inside the same project and got the same error.

PS Microsoft Azures website points us to Goblin as the primary gremlin api to use: https://docs.microsoft.com/en-us/azure/cosmos-db/. It seems like someone got it to work at sometime ;)

Thanks for your help!

-Jordan

davebshow commented 6 years ago

First things first: update your versions. Like I said, the versions are a bit wacky right now--I've been finishing grad school/moving/starting a new job and I'm still not caught up. In a fresh environment you would do:

$ pip install goblin==2.1.0rc2
$ pip install gremlinpython==3.2.6 --no-deps

This should also give you aiogremlin==3.2.6rc1.

Idk if this will solve your problems, but it is a first step. The Goblin 2.0 package tests against TP 3.2.4, and there have been quite a few changes since.

jbrow70 commented 6 years ago

Hi @davebshow,

So we updated the environment using the suggested versions above in a completely new virtualenv.

That did reduce the errors, but we did get the main one still:

line 8, in app = loop.run_until_complete(Goblin.open(loop, configfile='config.yaml' File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/goblin/app.py", line 66, in open config) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/cluster.py", line 86, in open await cluster.establish_hosts() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/cluster.py", line 134, in establish_hosts url, self._loop, dict(self._config)) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/server.py", line 98, in open await host.initialize() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/server.py", line 73, in initialize await conn_pool.init_pool() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/pool.py", line 131, in init_pool self._provider) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/pool.py", line 200, in _get_connection message_serializer=message_serializer, provider=provider) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/connection.py", line 97, in open await transport.connect(url, ssl_context=ssl_context) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/aiohttp/transport.py", line 18, in connect self._ws = await self._client_session.ws_connect(url) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/helpers.py", line 102, in await ret = yield from self._coro File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/client.py", line 390, in _ws_connect proxy_auth=proxy_auth) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/helpers.py", line 97, in iter ret = yield from self._coro File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/client.py", line 241, in _request yield from resp.start(conn, read_until_eof) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 559, in start (message, payload) = yield from self._protocol.read() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiohttp/streams.py", line 509, in read yield from self._waiter aiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x7f0f3c534ac8>

We can connect to Cosmos Graph DB using node.js gremlin package, and we can connect to it using the gremlin console. We'd like to use Python however, and it seems your library is the best approach, even Microsoft recommends it.

Please look to https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-gremlin-console#ConnectAppService for the yaml template that they recommend for Gremlin connecting to Cosmos DB.

davebshow commented 6 years ago

After looking again, it seems like this is a configuration issue. You can't pass the gremlin.yaml for the Java driver to Goblin. Gobliln config tries to be as similar as possible to the Java driver, but for practical reasons (like you can't pass java serializer classes to the Python code). Please try to update config based on Goblin app config options: http://goblin.readthedocs.io/en/latest/app.html.

Sorry, the docs aren't in the best state right now.

Also, it looks like you may need to configure ssl options. I'm not sure if you guys are using ssl really, but the examples for cosmos all set ssl=true, so you will at least probably need to set scheme: 'wss' in the conf (it looks like that is all the node client does anyway). In the case that you have key and cert files, there are config options for that as well

jbrow70 commented 6 years ago

Hi @davebshow ,

Good catch. I updated config based on Goblin app conf in http://goblin.readthedocs.io/en/latest/app.html.

Here's what it looks like now:

scheme: 'wss' hosts: ['hostname'] port: 443 ssl_certfile: '' ssl_keyfile: '' ssl_password: '' username: 'username' password: 'password' response_timeout: None max_conns: 4 min_conns: 1 max_times_acquired: 16 max_inflight: 64 message_serializer: 'goblin.driver.GraphSONMessageSerializer'

However, now it's looking for

/home/clink-im/envs/goblin_test/bin/python /home/clink-im/haydenbeadles/clearinghouse/cl_janus/test_goblin.py Traceback (most recent call last): File "/home/clink-im/haydenbeadles/clearinghouse/cl_janus/test_goblin.py", line 11, in app = loop.run_until_complete(Goblin.open(loop, configfile='config.yaml' File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/goblin/app.py", line 66, in open config) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/cluster.py", line 86, in open await cluster.establish_hosts() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/cluster.py", line 134, in establish_hosts url, self._loop, dict(self._config)) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/server.py", line 97, in open host = cls(url, loop, **config) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/server.py", line 34, in init certfile, keyfile=keyfile, password=ssl_password) FileNotFoundError: [Errno 2] No such file or directory

Seems to need the ssl files since we pass those config options. I can look again, but I didn't see any key and cert files available from cosmos db. Think that's all handled internally.

I commented the code like so:

server.py:

    if scheme in ['https', 'wss']:
        # certfile = config['ssl_certfile']
        # keyfile = config['ssl_keyfile']
        # ssl_password = config['ssl_password']
        ssl_context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
        # ssl_context.load_cert_chain(
        #     certfile, keyfile=keyfile, password=ssl_password)
        self._ssl_context = ssl_context
    else:
        self._ssl_context = None

It gets past that error.

Now I try either:

OGM create vertex script

import asyncio, datetime import goblin from goblin import Goblin from goblin import driver, abc, exception

from goblin import DriverRemoteConnection # alias for aiogreml

from goblin import element, properties

class Person(element.Vertex): name = properties.Property(properties.String) age = properties.Property(properties.Integer)

class Knows(element.Edge): notes = properties.Property(properties.String, default='N/A')

loop = asyncio.get_event_loop()

app = loop.run_until_complete(Goblin.open(loop, configfile='config.yaml'))

app.register(Person, Knows)

async def go(app): session = await app.session() print('test') leif = Person() leif.name = 'Leif'

leif.age = 28

# jon = Person()
# jon.name = 'Jonathan'
# works_with = Knows(leif, jon)
session.add(leif)
await session.flush()

loop.run_until_complete(go(app))

or

List vertex script

import asyncio, datetime import goblin from goblin import Goblin from goblin import driver, abc, exception

from goblin import DriverRemoteConnection # alias for aiogreml

loop = asyncio.get_event_loop()

from goblin import DriverRemoteConnection # alias for aiogremlin.DriverRemoteConnection from goblin import Graph # alias for aiogremlin.Graph

async def go(loop): remote_connection = await DriverRemoteConnection.open( loop=loop, configfile='config.yaml') g = Graph().traversal().withRemote(remote_connection) vertices = await g.V().toList() await remote_connection.close() return vertices

results = loop.run_until_complete(go(loop))

I get back this error for both:

/home/clink-im/envs/goblin_test/bin/python /home/clink-im/haydenbeadles/clearinghouse/cl_janus/test_goblin2.py Traceback (most recent call last): File "/home/clink-im/haydenbeadles/clearinghouse/cl_janus/test_goblin2.py", line 20, in results = loop.run_until_complete(go(loop)) File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "/home/clink-im/haydenbeadles/clearinghouse/cl_janus/test_goblin2.py", line 16, in go vertices = await g.V().toList() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/process/graph_traversal.py", line 26, in toList async for result in self: File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/process/graph_traversal.py", line 17, in anext self.last_traverser = await self.traversers.anext() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 66, in anext msg = await self.one() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 10, in wrapper msg = await fn(self) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 86, in one loop=self._loop) File "/usr/lib/python3.6/asyncio/tasks.py", line 342, in wait_for timeout_handle = loop.call_later(timeout, _release_waiter, waiter) File "/usr/lib/python3.6/asyncio/base_events.py", line 543, in call_later timer = self.call_at(self.time() + delay, callback, *args) TypeError: unsupported operand type(s) for +: 'float' and 'str'

Process finished with exit code 1

It fails on await session.flush() and vertices = await g.V().toList() respectively.

I'm not sure where to go from here. It seems to be ignoring my username and password config options, at least until it fails. Whatever I put there whether or not it matches from Cosmos DB the error doesn't change. Just an observation I see... I am putting the right usernames and password but just as a negative test case I tried putting wrong username and password in the yaml file but never saw a difference. Error doen't change that's what's above.

Again, thank you for your help here. I feel like we might be close?

jbrow70 commented 6 years ago

@davebshow

Think I may have figured that problem out. It was due to:

:param float response_timeout: (optional) None by default

Needs to be a float and passing response_timeout: None in yaml isn't a float value. I know your doc puts that in there, but it doesn't seem to like it.

If I use response_timeout: 60 or comment it out, #response_timeout: None in the yaml file for that config option, then the app either times out at 60 seconds. Error like so:

/home/clink-im/envs/goblin_test/bin/python /home/clink-im/haydenbeadles/clearinghouse/cl_cosmos/test_goblin2.py Traceback (most recent call last): File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 86, in one loop=self._loop) File "/usr/lib/python3.6/asyncio/tasks.py", line 362, in wait_for raise futures.TimeoutError() concurrent.futures._base.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/clink-im/haydenbeadles/clearinghouse/cl_cosmos/test_goblin2.py", line 20, in results = loop.run_until_complete(go(loop)) File "/usr/lib/python3.6/asyncio/base_events.py", line 467, in run_until_complete return future.result() File "/home/clink-im/haydenbeadles/clearinghouse/cl_cosmos/test_goblin2.py", line 16, in go vertices = await g.V().toList() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/process/graph_traversal.py", line 26, in toList async for result in self: File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/process/graph_traversal.py", line 17, in anext self.last_traverser = await self.traversers.anext() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 66, in anext msg = await self.one() File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 10, in wrapper msg = await fn(self) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/resultset.py", line 89, in one raise exception.ResponseTimeoutError('Response timed out') aiogremlin.exception.ResponseTimeoutError: Response timed out Task exception was never retrieved future: <Task finished coro=<Connection._receive() done, defined at /home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/connection.py:159> exception=AttributeError("'GremlinServerWSProtocol' object has no attribute '_transport'",)> Traceback (most recent call last): File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/connection.py", line 162, in _receive await self._protocol.data_received(data, self._result_sets) File "/home/clink-im/envs/goblin_test/lib/python3.6/site-packages/aiogremlin/driver/protocol.py", line 45, in data_received await self._transport.close() AttributeError: 'GremlinServerWSProtocol' object has no attribute '_transport' Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x7f9d36fc8d30>

Process finished with exit code 1

or

hangs indefinitely when I comment out response_timeout.

It seems to never really communicate past "connecting" with Gremlin on Cosmos, at least past the hostname. When I put the wrong hostname in hosts config option, it fails on Cannot connect to host _hostname_om:443 ssl:True [Name or service not known]. When I put the right one, it gets past the connection part it seems.

Like it gets to this:

async def go(loop): remote_connection = await DriverRemoteConnection.open( loop=loop, configfile='config.yaml') g = Graph().traversal().withRemote(remote_connection) --> vertices = await g.V().toList()

And then just hangs with no return, unless i set a numeric timeout for response_timeout

I see in debugger under .toList() It's hanging here:

async def toList(self):
    results = []
   --> async for result in self:
        results.append(result)
    return results

The self at this point looks like this:

self = {AsyncGraphTraversal} [['V']] bytecode = {Bytecode} [['V']] graph = {Graph} graph[empty] last_traverser = {NoneType} None side_effects = {TraversalSideEffects} sideEffects[size:0] traversal_strategies = {AsyncTraversalStrategies} Unable to get repr for <class 'aiogremlin.process.traversal.AsyncTraversalStrategies'> traversers = {NoneType} None

That doesn't look right to me (unable to get repr... doesn't look good and graph shows empty)

Here's what CosmosDB says needs to be placed for username and password (https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-gremlin-console#ConnectAppService):

username Your username The resource of the form /dbs//colls/ where is your database name and is your collection name.
password Your primary key See second screenshot below. This is your primary key, which you can retrieve from the Keys page of the Azure portal, in the Primary Key box. Use the copy button on the left side of the box to copy the value.

I guess that format, particularly on username, isn't causing issue with Goblin?

Anyway, if I leave username and password blank, it still hangs. Seems like it's not even got that far yet to authenticate.

These are just observations I see, maybe they'll help.

jbrow70 commented 6 years ago

One thing we'll try in the morning is reverting back Goblin and its dependencies to the latest stable release, just to cover bases there. Since we were passing unexpected yaml config params originally that weren't recognized by Goblin, now that we potentially have the yaml file Goblin safe, I'd like to try it against a stable release. I feel like Cosmos would have allowed backwards compatibility to Tinkerpop but maybe I'm wrong. Worth a try...

jbrow70 commented 6 years ago

After reverting back to latest stable versions, we continue to get errors. Currently our Goblin connection to Cosmos is still hanging. We reverted back to the versions you specified earlier.

davebshow commented 6 years ago

Ok, well I have some good news and some bad news.

Good news is that someone from Microsoft contacted me about getting going with Python and Cosmos, so hopefully I will have more info and there will be more documentation soon.

Bad news is that Cosmos DB Gremlin endpoint does not accept bytecode, only strings. This means that the Goblin, as well as GLV code, will not work with Cosmos. Instead, you have to submit queries as a string (shown in this example): http://aiogremlin.readthedocs.io/en/latest/usage.html#using-the-driver-module

Hopefully I will have more info for you soon.

jbrow70 commented 6 years ago

Hi @davebshow,

Thanks a bunch (and Microsoft) for taking point on this! It would be great to use Goblin, so I am liking the collaboration and efforts there.

That does sound like some work though to get it working with Cosmos DB Gremlin endpoint, with it not accepting bytecode. Would be a change...

Keep us in the loop, we'd appreciate it!

Our alternative approach for now (working in parallel) is to use OrientDB and their pyorient OGM and underlying driver. It's got its own hurdles we're working through. The agnostic language of Gremlin and the fully managed cosmos graph DB is very enticing.

davebshow commented 6 years ago

Update: they will be accepting bytecode in the near future. Also, tomorrow or the next I will be playing around a bit with Python and Cosmos. I'll let you know what I figure out.

jbrow70 commented 6 years ago

I was going to ask if Microsoft was going to accept bytecode soon so not surprised on that answer. They should keep it to standard ;) Sounds good!

jbrow70 commented 6 years ago

Hi @davebshow,

Any news on Goblin with Cosmos?

davebshow commented 6 years ago

Well, like I said, until they support bytecode, Goblin is a no go with Cosmos. I know that the guys trying things out at Microsoft have got Gremlin Python up and running with Cosmos, and I was going to try some script submission with aiogremlin when I get a chance.

jbrow70 commented 6 years ago

Ok thanks Dave for the update!

soderluk commented 6 years ago

@davebshow: Any news on this? I'm interested to try this library out with Cosmos DB, but having problems with the configuration as well.

davebshow commented 6 years ago

AFAIK Cosmos still doesn't support Bytecode, so using a GLV based solution (like Goblin) still isn't an option. You should be able to connection with the aiogremelin driver and submit traversal strings. I haven't tried it, but apparently it works with Gremlin-Python so I don't see why it wouldn't with aiogremlin

raghavsuriya commented 1 year ago

@davebshow Is this still the case ? Any update on this ?