Closed saramsey closed 4 years ago
Looks like a problem of python not being able to open the neo4j connection properly (intermittently). I have no idea about how to debug that. Maybe an SSL thing. Sorry, I really don't know. I don't run on OS X.
Yes, I am just documenting the issue here. Totally understand that this may be a user-setup issue.
Wonder if there is some kind of "race condition" going on, in the Neo4j connection establishment.
This might be cause by ReasoningUtilities
being imported in ARAX_query.py
that also tries to make a Neo4j connection (I've seen this same issue on two different flavors of Ubuntu). If @edeutsch isn't using ReasoningUtilities, I vote we remove it from ARAX_query.py
I don't recall if any code is using it. A lot of stuff imports it. I can try to pull it out and see if it all still works.
I have observed this same behavior many times now too. I recently observed it running KGNodeIndex which does an import of ReasoningUtilities. The only thing it imports is: import ReasoningUtilities as RU from RTXConfiguration import RTXConfiguration
I'm pretty sure RTXConfiguration won't touch neo4j.
so the conclusion is that just a single import of ReasoningUtilities can sporadically cause this error.
Seems to happen on multiple clients. I've seen it on arax.rtx.ai. And Steve on his Mac.
I am concerned there is an instability with the neo4j server itself that causes it sometimes to reject connections.
Traceback (most recent call last):
File "KGNodeIndex.py", line 14, in
import ReasoningUtilities as RU
File "G:\Repositories\GitHub\RTX\code\reasoningtool\kg-construction/../QuestionAnswering\ReasoningUtilities.py", line 61, in
driver = GraphDatabase.driver(rtxConfig.neo4j_bolt, auth=basic_auth(rtxConfig.neo4j_username, rtxConfig.neo4j_password))
File "C:\Program Files\Python37\lib\site-packages\neo4j__init.py", line 120, in driver
return Driver(uri, config)
File "C:\Program Files\Python37\lib\site-packages\neo4j__init.py", line 161, in new__
return subclass(uri, config)
File "C:\Program Files\Python37\lib\site-packages\neo4j__init.py", line 235, in new__
pool.release(pool.acquire())
File "C:\Program Files\Python37\lib\site-packages\neobolt\direct.py", line 715, in acquire
return self.acquire_direct(self.address)
File "C:\Program Files\Python37\lib\site-packages\neobolt\direct.py", line 608, in acquire_direct
connection = self.connector(address, error_handler=self.connection_error_handler)
File "C:\Program Files\Python37\lib\site-packages\neo4j\init__.py", line 232, in connector
return connect(address, dict(config, kwargs))
File "C:\Program Files\Python37\lib\site-packages\neobolt\direct.py", line 972, in connect
raise last_error
File "C:\Program Files\Python37\lib\site-packages\neobolt\direct.py", line 963, in connect
s, der_encoded_server_certificate = _secure(s, host, security_plan.ssl_context, **config)
File "C:\Program Files\Python37\lib\site-packages\neobolt\direct.py", line 854, in _secure
s = ssl_context.wrap_socket(s, server_hostname=host if HAS_SNI and host else None)
File "C:\Program Files\Python37\lib\ssl.py", line 423, in wrap_socket
session=session
File "C:\Program Files\Python37\lib\ssl.py", line 870, in _create
self.do_handshake()
File "C:\Program Files\Python37\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
FileNotFoundError: [Errno 2] No such file or directory
Although weirdly, a FileNotFoundError. This was on Windows.
This just happened to me over on arax.rtx.ai (completely different OS than previous report). It seems to happen most frequently when a shell hasn't connected in a long while? something stale-ish? Immediately re-trying the script yielded success. RTXConfiguration is no longer part of the picture. So the only import is ReasoningUtilities. and that's where it dies.
Traceback (most recent call last):
File "KGNodeIndex.py", line 14, in
On arax.rtx.ai I continue to get this about once per day. Seems to be most frequent when connecting to neo4j after a bit of a hiatus. i.e. the first time I run a script in a day it has a higher chance of this crash than during an active development/debugging session. but the latter is not 0.0 either.
Immediate retry succeeds 95% of the time. Is 100% success after two retries.
FWIW, I can confirm that I am seeing this from time to time, even after having switched to a "stock" installation of ARAX from the demo branch (no funky sqlite stuff).
on your mac?
I am also seeing this problem more and more frequently on the running instances. It happened a lot this afternoon. I have an hourly health monitoring agent and here is a list of all the errors for the last week. Good thing it was stable before noon today, cuz it sure wasn't after noon.
On a whim, tried this on arax.rtx.ai:
pip3 install --upgrade neo4j-driver Successfully installed neo4j-driver-1.7.6
pip3 install --upgrade neobolt Successfully installed neobolt-1.7.16
and restarted /beta endpoint.
fingers crossed.
Wanted to document that I also get this error on my macbook multiple times a day - agree it seems to be most frequent when connecting to neo4j after a hiatus, though that's not always the case. It always works fine if I immediately retry.
(rtx-env-7) amys-macbook:ARAXQuery aglen$ python ARAX_query.py 1212
Traceback (most recent call last):
File "ARAX_query.py", line 36, in <module>
from ParseQuestion import ParseQuestion
File "/Users/aglen/translator/RTX/RTX/code/ARAX/ARAXQuery/../../reasoningtool/QuestionAnswering/ParseQuestion.py", line 3, in <module>
import Question
File "/Users/aglen/translator/RTX/RTX/code/ARAX/ARAXQuery/../../reasoningtool/QuestionAnswering/Question.py", line 14, in <module>
from ReasoningUtilities import ReasoningUtilities as RU
File "/Users/aglen/translator/RTX/RTX/code/ARAX/ARAXQuery/../../reasoningtool/QuestionAnswering/ReasoningUtilities.py", line 61, in <module>
driver = GraphDatabase.driver(rtxConfig.neo4j_bolt, auth=basic_auth(rtxConfig.neo4j_username, rtxConfig.neo4j_password))
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neo4j/__init__.py", line 120, in driver
return Driver(uri, **config)
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neo4j/__init__.py", line 161, in __new__
return subclass(uri, **config)
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neo4j/__init__.py", line 235, in __new__
pool.release(pool.acquire())
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neobolt/direct.py", line 715, in acquire
return self.acquire_direct(self.address)
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neobolt/direct.py", line 608, in acquire_direct
connection = self.connector(address, error_handler=self.connection_error_handler)
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neo4j/__init__.py", line 232, in connector
return connect(address, **dict(config, **kwargs))
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neobolt/direct.py", line 972, in connect
raise last_error
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neobolt/direct.py", line 963, in connect
s, der_encoded_server_certificate = _secure(s, host, security_plan.ssl_context, **config)
File "/Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages/neobolt/direct.py", line 854, in _secure
s = ssl_context.wrap_socket(s, server_hostname=host if HAS_SNI and host else None)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 423, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 870, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
OSError: [Errno 0] Error
Here's my environment's neo4j-driver
info:
(rtx-env-7) amys-macbook:ARAXQuery aglen$ pip show neo4j-driver
Name: neo4j-driver
Version: 1.7.6
Summary: Neo4j Bolt driver for Python
Home-page: https://github.com/neo4j/neo4j-python-driver
Author: Neo Technology
Author-email: drivers@neo4j.com
License: Apache License, Version 2.0
Location: /Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages
Requires: neobolt, neotime
Required-by:
And neobolt
:
(rtx-env-7) amys-macbook:ARAXQuery aglen$ pip show neobolt
Name: neobolt
Version: 1.7.16
Summary: Neo4j Bolt connector for Python
Home-page: https://github.com/neo4j-drivers/neobolt
Author: Neo4j Sweden AB
Author-email: drivers@neo4j.com
License: Apache License, Version 2.0
Location: /Users/aglen/translator/RTX/rtx-env-7/lib/python3.7/site-packages
Requires:
Required-by: neo4j-driver
It seems that everyone is seeing this on multiple client platforms. I'm thinking it's server. Is there a possibility that there is a newer version of the neo4j server available, perhaps with some bugfixes?
@edeutsch looks like some sort of encryption issue as noted in these threads/issues observing the same thing: https://github.com/neo4j/neo4j/issues/12392 https://community.neo4j.com/t/connect-to-neo4j-hosted-on-a-remote-ec2-instance-via-python-running-on-my-current-ec2-instance/13774/2 https://community.neo4j.com/t/connecting-to-local-db-via-python-bolt-episode-ii/16969
So looks like the fix is driver = GraphDatabase.driver(... , encrypted=False)
Or set up SSL properly with https://github.com/neo4j/neo4j/issues/12392#issuecomment-589730390
Seems like most of these complaints are about not being able to connect at all? Rather than just intermittent failures? But, we can certainly try it. I assume SSL is set up properly on neo4j, otherwise we wouldn't be able to connect at all?
So the two options seem to be: 1) See if we are running the very latest neo4j code and upgrade if not 2) turn off encrypted connections as suggested above
Which approach shall we try first?
@edeutsch Option 1 seems best (if there's a way to roll back if it breaks things), then 2 if that doesn't work
I agree. Can we ask @finnagin or @saramsey to try option 1?
Next steps to try from the meeting today:
encrypted=False
to line 61 in reasoningutilitiesOK @finnagin please try:
@amykglen is something like this what you are using to test the error?
from neo4j.v1 import GraphDatabase, basic_auth
from RTXConfiguration import RTXConfiguration
rtxConfig = RTXConfiguration()
for i in range(40):
driver = GraphDatabase.driver(rtxConfig.neo4j_bolt, auth=basic_auth(rtxConfig.neo4j_username, rtxConfig.neo4j_password))
yes, close, although I avoided using RTXConfiguration (to totally eliminate all RTX/ARAX code) and just (locally) directly plugged in the auth info:
from neo4j import GraphDatabase
for num in range(80):
driver = GraphDatabase.driver(BOLT_URL, auth=(USER_NAME, PASSWORD))
with driver.session() as session:
results = session.run(cypher_query).data()
driver.close()
unfortunately adding encrypted=False
does not seem to work.
Here are the results form the diff between arax and kg1endpoint. Things that jump out at me:
> #dbms.active_database=graph.db
12c13,14
< dbms.directories.data=/var/lib/neo4j/data
---
> #dbms.directories.data=/var/lib/neo4j/data
> dbms.directories.data=/mnt/data/RTX1
35,36c37,38
< dbms.memory.heap.initial_size=512m
< dbms.memory.heap.max_size=12G
---
> #dbms.memory.heap.initial_size=512m
> #dbms.memory.heap.max_size=512m
124c126
< #dbms.ssl.policy.default.base_directory=certificates/default
---
> #dbms.ssl.policy.default.base_directory=/etc/letsencrypt/live/ncats.saramsey.org
147c149
< #dbms.ssl.policy.default.private_key=
---
> #dbms.ssl.policy.default.private_key=privkey.pem
153c155
< #dbms.ssl.policy.default.public_certificate=
---
> #dbms.ssl.policy.default.public_certificate=fullchain.pem
216a219,225
> # Query logging options
> dbms.logs.query.enabled=true
> dbms.logs.query.rotation.keep_number=1000
> dbms.logs.query.rotation.size=1G
> dbms.logs.query.threshold=0
> dbms.logs.query.time_logging_enabled=true
>
319c328
< dbms.security.procedures.unrestricted=apoc.*,algo.*
---
> dbms.security.procedures.unrestricted=apoc.*
at java.lang.Thread.run(Thread.java:748)
2020-05-19 20:18:22.695+0000 ERROR [io.netty.util.concurrent.DefaultPromise.rejectedExecution] Failed to submit a listener notification task. Event loop shut down? event executor terminated
java.util.concurrent.RejectedExecutionException: event executor terminated
from a quick google search I found this: https://github.com/netty/netty/issues/7289
And it is mentioned that this error has to do with memory management so this makes me think that changing that config option to allow for a larger max heap size might work.
I think the next step would be to try changing this config option and then restarting neo4j to see if this would work though I would like to wait for a low use time to try restarting. Maybe late at night?
I think the next step would be to try changing this config option and then restarting neo4j to see if this would work though I would like to wait for a low use time to try restarting. Maybe late at night?
OK sure, but please save a copy of the previous config file so you can revert.
Just updated neo4j.conf (after saving a backup) and restarted and this seems to have fixed it. I just tried a loop of 80 and then another loop of 300 and no errors on my end.
Awesome! @finnagin can you please
Solid teamwork on this issue, folks.
Now that I've uploaded the conf file I'm going to go ahead and close this issue.
Quite often (but not consistently everytime), I get the following error when importing
ARAXQuery
in thedemo
branch:This is on a 2018 MBP running macOS 10.14.6 and running python 3.7.6. The output of
pip freeze
is shown below: