astronomer / telescope

Other
33 stars 4 forks source link

Support for Airflow deployments when not using system Python #183

Closed cdabella closed 9 months ago

cdabella commented 1 year ago

The current AIRFLOW_REPORT_CMD uses python to run the report package.

https://github.com/astronomer/telescope/blob/b837a2c7730bfabbf03aa51665db8202b4a6b1b3/astronomer_telescope/getter_util.py#L31-L37

For many Airflow installations, especially single-host Airflow deployments, it is more common for Airflow to be installed within a virtualenv of some flavor. Using logic similar to the following, we can make this support targeting Airflow under whatever python binary it might be running without manual user intervention.

from subprocess import Popen, PIPE

python_bin = [
    p.decode()
    for p in Popen(
        ["ps", "-eo", "command"], shell=False, stdout=PIPE
    ).stdout.readlines()
    if b"/bin/airflow scheduler" in p
][0].split()[0]
print(
    "".join(
        [
            p.decode()
            for p in Popen(
                [
                    python_bin,
                    "-W",
                    "ignore",
                    "-c",
                    "import runpy,os;from urllib.request import urlretrieve as u;a='airflow_report.pyz';u('https://github.com/astronomer/telescope/releases/download/v3.1.0/airflow_report.pyz', a);runpy.run_path(a);os.remove(a)",
                ],
                shell=False,
                stdout=PIPE,
            ).stdout.readlines()
        ]
    )
)

If we don't want to add/test baking the functionality into the package itself, we can atleast at the following to the documentation, which takes the above, minifies and escapes the code, and sets it as the TELESCOPE_AIRFLOW_REPORT_CMD envvar.

TELESCOPE_AIRFLOW_REPORT_CMD="python -W ignore -c \"D=False;from subprocess import Popen as A,PIPE as B;C=[C.decode()for C in A(['ps','-eo','command'],shell=D,stdout=B).stdout.readlines()if b'/bin/airflow scheduler'in C][0].split()[0];print(''.join([E.decode()for E in A([C,'-W','ignore','-c',\\\"import runpy,os;from urllib.request import urlretrieve as u;a='airflow_report.pyz';u('https://github.com/astronomer/telescope/releases/download/v3.1.0/airflow_report.pyz', a);runpy.run_path(a);os.remove(a)\\\"],shell=D,stdout=B).stdout.readlines()]))\"" telescope --local

NOTE

There is probably a cleaner way of determining the python binary path, something like the following, but I was unable to get the interplay of shlex.split and shlex.join to run the subshell, probably because of the the security features of shlex.quote https://docs.python.org/3/library/shlex.html#shlex.quote. I tried using shlex.shlex with punctuation_chars=True and punctuation_chars="();<>|&$" with no success in getting the appropriate tokenization.

CMD = (
    "$(ps -eo command | grep '/bin/airflow scheduler' | grep -v grep | awk '{print $1}') -W ignore -c \""
    "import runpy,os;from urllib.request import urlretrieve as u;"
    f"a='{REPORT_PACKAGE}';"
    f"u('{REPORT_PACKAGE_URL}', a);"
    f'runpy.run_path(a);os.remove(a)"'
)
fritz-astronomer commented 1 year ago

Good stuff - I guess my questions are if ps, grep, or /bin/airflow would ever be reasonably expected to not be there?

We could feature flag this like the airgap mode, as a fallback if python doesn't work out of the box?

Wanna make a PR?

fritz-astronomer commented 9 months ago

There's now docs/ if you want to add a snippet for that AIRFLOW_REPORT_CMD