Open divincode opened 1 year ago
@divincode Thanks a lot for the detailed report and suggested solution. I've added the Good first issue but will also assign it to one of our core devs according to our new github issue process https://gethue.com/blog/hue-community-2023/
We also met this error, it will cause py4j jdbc connections leakage, and hue's memory grows up quickly
which engine are u using?
which engine are u using?
We are also using trino 433 jdbc with hue, you mentioned that we can use this code to fix this:
#lines to be added
try:
self.gateway.shutdown()
except Exception as e:
LOG.error(e)
Thanks a lot, we will try to modify the source code like this~
I think this wont work though completely ..
Do try but i think we should use https://github.com/trinodb/trino-python-client ,
make the code changes to use this
def query_and_fetch(db, statement, n=None):
data = None
try:
db.connect()
curs = db.cursor()
try:
if curs.execute(statement):
data = curs.fetchmany(n)
meta = curs.description
return data, meta
finally:
curs.close()
except Exception as e:
message = force_unicode(smart_str(e))
if 'Access denied' in message:
raise AuthenticationRequired()
raise
finally:
db.close()
from trino.dbapi import connect
conn = connect(
host="<host>",
port=<port>,
user="<username>",
catalog="<catalog>",
schema="<schema>",
)
cur = conn.cursor()
cur.execute("SELECT * FROM system.runtime.nodes")
rows = cur.fetchall()
use this this should remove py4j competely
@divincode we find this problem "/usr/share/hue/build/env/lib/python3.8/site-packages/py4j/java_gateway.py", line 980, in _get_connection raise Py4JNetworkError("Gateway is not connected.") I will try trino python client again
If it works can u please contribute back to the community ? @hackeryang @zhuzhengjun01
If it works can u please contribute back to the community ? @hackeryang @zhuzhengjun01
I tried use shutdown interface in the close function , but it does't work well, py4j gateway did not shutdown correctly
No i mean the trino python client, would solve the use case. Please try that.
No i mean the trino python client, would solve the use case. Please try that.
Oh understand, we extended hue jdbc_trino, but use trino java jdbc client, change to python client will be a big change, I will try again
The main thing is that the jdbc connector does not support set sessions, and some of our trino syntax depends on this
No i mean the trino python client, would solve the use case. Please try that.
I have tried using python client to connect to trino, the rss memory is fine, no more memory leaks, I will contribute this connector to the community if needed
Nice to hear ! do contribute it back or at least paste the diff , so that it can be used by others.
Nice to hear ! do contribute it back or at least paste the diff , so that it can be used by others. I added a new PR https://github.com/cloudera/hue/pull/3831
Is there an existing issue for this?
Description
The default connector for trino is jdbc as there is no sqlalchemy support as per my knowledge from sqlalchemy community. When running hue with multiple clients, the jdbc class calls the py4j server, which still exist it does not close after we get the results, this leads to memory usage and hence the queries get slower as time passes and after certain time it still hangs . eg
For executing the trino query - code flow goes like this -
I have added various debugging points , to check where the bottleneck is
the bottle neck is this point -
data = curs.fetchmany(n) this line is the bottleneck usually.
The issue should be solved if add the following lines in the close fxn of this file in jdbc class - desktop/libs/librdbms/src/librdbms/jdbc.py
After adding these lines the child process of py4j gets killed. Verified by the ps auxxx and pstree command.
This can be a good first issue to solve to raise pr . Nowadays work on Hadoop so not that band width to raise here . Just trying to contribute to oss. This flow is also same for presto.
@Harshg999 @bjornalm
Regards Vinay Devadiga
Steps To Reproduce
As stated in descritption use trino with hue , create multiple hue clients and fire huge trino queries . In some time, the py4j servers will take the memory , hence queries get hangs.
Logs
Attached above.
Hue version
Open Source 4.10