apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.37k stars 14.1k forks source link

Set `http_client.HTTPConnection.debuglevel` back to `0` before emitting `OpenLineage` events #39795

Closed wslulciuc closed 3 months ago

wslulciuc commented 3 months ago

Apache Airflow Provider(s)

openlineage

Versions of Apache Airflow Providers

No response

Apache Airflow version

N/A

Operating System

N/A

Deployment

Astronomer

Deployment details

No response

What happened

No response

What you think should happen instead

No response

How to reproduce

When a DAG author enables DEBUG logging for HTTPConnection with:

http_client.HTTPConnection.debuglevel = 1

the OpenLineage provider logs: headers, body, etc. We don't want these requests logged!

Here is an example of a reproduction:

import requests
import http.client as http_client

http_client.HTTPConnection.debuglevel = 1
resp = requests.get("http://localhost:8080/login")
send: b'GET /login/ HTTP/1.1\r\nHost: localhost:8080\r\nUser-Agent: python-requests/2.31.0\r\nAccept-Encoding: gzip, deflate, zstd\r\nAccept: */*\r\nConnection: keep-alive\r\nCookie: session=e9194549-d5d1-440e-883c-c0f4608f0def.fDuMSL2im-gbcnJP03WRbJc0KqI\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: Werkzeug/2.2.3 Python/3.11.9
header: Date: Thu, 23 May 2024 17:20:48 GMT
header: Content-Type: text/html; charset=utf-8
header: Content-Length: 18079
header: Cache-Control: no-store
header: X-Robots-Tag: noindex, nofollow
header: Set-Cookie: session=e9194549-d5d1-440e-883c-c0f4608f0def.fDuMSL2im-gbcnJP03WRbJc0KqI; Expires=Sat, 22 Jun 2024 17:20:48 GMT; HttpOnly; Path=/; SameSite=Lax
header: Connection: close

Expected behavior:

No debug logging of HTTP request when uploading OpenLineage events, even if the DAG author has turned it on for their task.

Anything else

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 3 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

Taragolis commented 3 months ago

Are you sure that it related to Apache Airflow OpenLineage provider and not https://github.com/OpenLineage/OpenLineage ?

cc: @mobuchowski @kacpermuda

mobuchowski commented 3 months ago

Yeah, it's related to our HttpTransport https://github.com/OpenLineage/OpenLineage/blob/main/client/python/openlineage/client/transport/http.py