Closed tkilias closed 3 years ago
Can you set multiple directories in PYTHONPATH, like this: https://stackoverflow.com/questions/39682688/how-to-set-pythonpath-to-multiple-folders ?
Include standard directory AND custom directory at the same time.
In theory, it is possible to rewrite it from subprocess to multiprocessing (fork), but it would lead to loss of compatibility with Windows OS.
Windows is a bit lame for big data, but a lot of Exasol users still rely on it.
I am actually, not sure what AWS Lambda does to the environment, so I first need to check this and maybe try to reproduce the error locally.
I have the feeling, subprocess is fine, but we might need to start it with the right environment and python interpreter command line options. I will come back, as soon I have further information.
subprocess.Popen()
should inherit the environment variables from the parent process.
Try dumping os.environ
and see how it goes.
Ok, yes PYTHONPATH would work, I tested it locally. I didn't think it would be this, but better to be sure. However, there are a few other ways to modify the module search paths. For example, sys.path is a not so uncommon way. This is the next thing, I will try.
@wildraid Ok, I also tried modifying the sys.path before imporating pyexasol and this reproduces the error. I packaged my small test and attached here. pyexasol_subprocess.tar.gz Extract the tar, fix the DSN in the python files and run the following
docker build -t pyexasol_test .
docker run --net=db_network_test pyexasol_test bash run_tests.sh
The first test uses PYTHONPATH and is successful and second uses the sys.path and fails.
Tomorrow, I am going to check, what other mechanisms might modify the module search path and which one the AWS Lambdas use.
Have a good evening.
Well, it is easy to explain such results.
Changing sys.path
affects current Python interpreter only. But PYTHONPATH is an environment variable, which is automatically inherited by subprocess and applied to newly started Python interpreter.
@wildraid yep, you are completely right, and I think, AWS Lambdas do that for some reason to allow the usage of additional python packages. If I read the python documentation correctly, the sys.path is the only other way to manipulate the module search path besides the PYTHONPATH. So maybe a workaround could be to append the sys.path to the PYTHONPATH of the current process before calling export_to_pandas. As in the following example.
import os
additional_python_path = os.pathsep.join(sys.path)
if not "PYTHONPATH" in os.environ:
new_python_path = additional_python_path
else:
current_python_path = os.environ["PYTHONPATH"]
new_python_path = current_python_path + ":" + additional_python_path
os.environ["PYTHONPATH"] = new_python_path
import pyexasol
c=pyexasol.connect(dsn="172.18.0.2:8888", user="sys", password="exasol")
df=c.export_to_pandas("select * from test.comp1")
A cleaner solution could be, to only add the sys.path to the PYTHONPATH variable for the environment of the subprocess with the env argument. What do you think?
Can we just set a global PYTHONPATH for specfic use case involving AWS Lambda? And call it a day.
We definitely cannot touch sys.path or PYTHONPATH inside the library code, since it can mess up the higher level applications using pyexasol.
Maybe, not sure. In the moment, I can't test it with AWS lambda. A problem, I could think of is, that the path to the module code might be not static in AWS Lambdas. In that case, the code of the lambda would need to change the PYTHONPATH variable, as in my example.
@tkilias , I suspect you've managed to resolve this issue using PYTHONPATH.
Is it the case? :)
@wildraid I think so, the AWS Lambda environment is quite specific, so a general fix in the code is probably not productive and might cause more problems than it solves. Adding the sys.path to the PYTHONPATH environment variable within the same python process is at least a workaround. For that reason, I am going to close the ticket. Thx, for your help.
Hi @wildraid,
we saw a bit strange behavior of pyexasol in AWS Lamda. Everything except the http_transport works. We saw the following error message:
With the following stacktrace:
My guess, is that, the starting of the subprocess for the http_transport fails (see source reference below), because the pyexasol_utils module is not in the default module search path and the search path for the parent process was modified.
https://github.com/badoo/pyexasol/blob/3b5211fa78e4d83ea16e11532048f6cdcaeab43d/pyexasol/http_transport.py#L244
I would try next to get additional information about the environment with
Any thoughts?