getsentry / airflow-metrics

Metrics for airflow
Apache License 2.0
14 stars 6 forks source link

google.api_core error #45

Open will-misslin opened 5 years ago

will-misslin commented 5 years ago

Upon installing airflow-metrics via poetry, we began receiving this error in Sentry.

ModuleNotFoundError
No module named 'google.api_core'

We believe the cause to be that the airflow_metrics_bq_enabled option is enabled by default.

Here is the traceback:

ModuleNotFoundError: No module named 'google.api_core'
  File "airflow_metrics/utils/fn_utils.py", line 52, in wrapped
    return func(*args, **kwargs)
  File "airflow_metrics/airflow_metrics/__init__.py", line 47, in monkey_patch_bq
    from airflow_metrics.airflow_metrics.patch_bq import patch_bq
  File "airflow_metrics/airflow_metrics/patch_bq.py", line 1, in <module>
    from airflow.contrib.operators.bigquery_operator import BigQueryOperator
  File "airflow/contrib/operators/bigquery_operator.py", line 26, in <module>
    from airflow.contrib.hooks.bigquery_hook import BigQueryHook
  File "airflow/contrib/hooks/bigquery_hook.py", line 34, in <module>
    from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
  File "airflow/contrib/hooks/gcp_api_base_hook.py", line 30, in <module>
    from google.api_core.exceptions import GoogleAPICallError, AlreadyExists, RetryError

Our docker-compose airflow instance intentionally does not use the airflow.cfg config file in our code base, relying instead on the default provided by airflow. Ideally, we would like to set airflow-metric configuration options through environment variables.

will-misslin commented 5 years ago

Looks like Airflow provides this functionality already. We are experiencing issue #44 and had to revert as our scheduler was crashing so I cannot test the env vars, but if you can confirm that this is expected behavior when gcp is not installed and airflow-metrics is installed, that would help me confirm what I am seeing.

Zylphrex commented 5 years ago

airflow-metrics does try to enable all the metrics by default, and currently only gives the option to disable the metrics through the airflow.cfg. One of the default metrics requires the use of gcp, so I believe this to be the reason your scheduler is crashing.

Seems like the solution here should be to disable the metrics that require additional dependencies by default, and allow them to be toggled the same way as other airflow configurations.

maxcountryman commented 5 years ago

airflow-metrics does try to enable all the metrics by default, and currently only gives the option to disable the metrics through the airflow.cfg.

This doesn't seem great. Especially the fact that everything is enabled by default. In my experience with Airflow, the preference these days is to use environment variables (this is the order of precedence when Airflow resolves configuration as well).

Are there plans to support configuration a la Airflow via environment variables?

As an aside, I don't suspect this is related to the scheduler crashing: the scheduler runs for hours before dying inexplicably.