Closed k-lyda closed 2 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
I have same issue with remote logging: Apache Airflow version: 2.0.0
Environment:
AIRFLOW__CORE__REMOTE_LOGGING: "True"
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER: "s3://bucket/airflow/logs"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID: "custom_s3_id"
airflow.cfg
[logging]
remote_logging = True
remote_log_conn_id = custom_s3_id
remote_base_log_folder = s3://bucket/airflow/logs
What happened:
With Example Pipeline definition remote logging work fine, but after add import mlflow
to dag code, airflow doesn't send logs to s3 storage.
What you expected to happen:
Remote logging works fine with import mlflow
How to reproduce it:
import mlflow
to dag code;Anything else we need to know: Tested on 2 PC with different os and hardware
This sounds like mlflow is doing something to the python loggers on import that it shouldn't be doing, and is a bug in that library.
There also might be another problem:
Do you also use/import snowflake provider as well? We have the ongoing problem with #12881 and that might be the root cause of the problem.
If that's the case, for now you might remove the provider (and snowflake-connector-python). We are working with snowflake team ( they just merged https://github.com/snowflakedb/snowflake-connector-python/pull/591 and https://github.com/snowflakedb/snowflake-connector-python/pull/592) and as soon as they release new version of the connector, this problem should be gone.
If you have it can you remove the snowflake import/library and let us know if it fixes the problem?
But if you do not import snowflake anywhere:
Solution to this 'snowflake' problem will also unblock upgrade to a newer version of requests
and urrlib3
libraries which might be another reason why mlflow does not work.
But you should be able to manually upgrade requests library and urllib3 libraries to latest versions (even if they will tell that there is a 'requests/urllib3 conflict'. In the upcoming bugfix 1.10.15 release, this limitation will be gone, regardless if snowflake manages to release a new library or not, but for now you would have to upgrade those manually.
Could try it and let us know ?
i can confirm it doesn't work if you add from mlflow.tracking import MlflowClient
to any dag.
mlflow version: 1.19.0
airflow: 1.10.15
Just ran into this issue as our team is starting to use MLFlow:
airflow: 2.1.3 (kubernetes executor)
mlflow: 1.20.0
All of our DAGs are able to send logs up to S3 but any DAGs that import MLFlow silently fail to upload the logs to s3. Tasks in the DAGs behave normally and can even sync other data to s3 just fine but the logging code does not appear to be running.
It feels like the MLFlow code is overriding the task log handler that we use to write the logs to s3. MLFlow init file does load a logging config (init file + logging config) Could be related? I'll be filing an issue with MLFlow's project.
I opened an issue with the mlflow project as it is likely an issue with their logging config. I did however find a work around which is to import mlflow inside of a function so that the import doesn't happen until the task's run time.
def train_model():
import mlflow
# ...
pass
with DAG("my_dag", schedule_interval='@daily') as dag:
PythonOperator(
task_id=f"training_model",
python_callable=train_model,
)
Apache Airflow version: Apache Airflow [1.10.12]
Kubernetes version (if you are using kubernetes) (use
kubectl version
): Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.12", GitCommit:"a8b52209ee172232b6db7a6e0ce2adc77458829f", GitTreeState:"clean", BuildDate:"2019-10-15T12:04:30Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}Environment:
uname -a
):What happened: I want to save logs on S3 storage. I've added proper configuration. Logging is working fine, unless I add import of MLFlow library in any of the files.
I don't even have to use this tool, just
from mlflow.tracking import MlflowClient
is enough to break the logging to S3.What you expected to happen: There is probably some mismatch in s3 credentials, but I don't see any specific error message in logs.
How to reproduce it:
from mlflow.tracking import MlflowClient
to DAG file - logging to s3 is now not working.Anything else we need to know: This problem occurs every time, when MLFlow is imported to any file processed by Airflow.