exasol / pyexasol

Exasol Python driver with low overhead, fast HTTP transport and compression
MIT License
71 stars 39 forks source link

profiling a script with pyexasol with scalene #94

Closed ghost closed 2 years ago

ghost commented 2 years ago

Hello,

I'm trying to profile a script that uses pyexasol using scaelene. While conn.execute() seems to work (didn't let it run through, but there was debug output), the whole script freezes when calling conn.export_to_pandas() or conn.export_to_file() - there isn't even any debug output about the query. Do you have an idea what could be causing this?

Best regards

littleK0i commented 2 years ago

HTTP transport runs 2 additional threads:

1) Thread to manage pseudo HTTP-server, waiting for "incoming" connection from Exasol. 2) Thread to run SQL query and check its status. 3) Main thread is busy processing data in callback function.

These are normal python threads, no extra magic.

ghost commented 2 years ago

That's strange... Did you profile pyexaols memory usage in the past? If so, which tools did you use?

littleK0i commented 2 years ago

I've used basic built-in profiler: https://docs.python.org/3/library/profile.html

Pyexasol does not accumulate data at any point. Everything is processed in chunks.

Data might be accumulated in callback function output, like building pandas data frame. But no accumulation in the core, so memory should not be an issue.

ghost commented 2 years ago

is writing to file output also done in chunks? or ist it more like pandas output?

littleK0i commented 2 years ago

Of course. Writing to file uses shutil.copyfileobj, which operates in chunks. https://docs.python.org/3/library/shutil.html#shutil.copyfileobj

This is the place in code: https://github.com/exasol/pyexasol/blob/master/pyexasol/callback.py#L52

ghost commented 2 years ago

Thanks, in that case while I can't profile with scalene, at least I can be pretty sure pyexasol isn't the cause. I'll close this this ticket.