cal-itp / reports

GTFS data quality reports for California transit providers
https://reports.calitp.org
GNU Affero General Public License v3.0
7 stars 0 forks source link

Fix plotnine error during report generation #206

Closed acouch closed 1 year ago

acouch commented 1 year ago

Description

When running the report locally, I get the following error:


    import matplotlib._contour as _contour
ModuleNotFoundError: No module named 'matplotlib._contour'

The resolution is described in this stackoverflow q/a.

The issue was fixed in plotnine v0.10.0, however when I upgrade to that version, it is necessary to upgrade several packages because of pandas and google dependencies:

Update plotnine diff ```diff diff --git a/requirements.txt b/requirements.txt index e509b845a..8c77b4ea6 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,11 +1,15 @@ aiohttp==3.7.4.post0 ansiwrap==0.8.4 +anyio==3.6.2 appdirs==1.4.4 appnope==0.1.2 argon2-cffi==21.1.0 async-timeout==3.0.1 attrs==21.2.0 +Babel==2.11.0 backcall==0.2.0 +backports.zoneinfo==0.2.1 +beautifulsoup4==4.11.1 bleach==4.1.0 cachetools==4.2.2 calitp==0.0.15 @@ -14,26 +18,31 @@ cffi==1.14.5 cfgv==3.3.0 chardet==4.0.0 click==8.0.1 +contourpy==1.0.6 +cycler==0.11.0 debugpy==1.4.3 decorator==5.0.9 defusedxml==0.7.1 distlib==0.3.2 entrypoints==0.3 +fastjsonschema==2.16.2 filelock==3.0.12 +fonttools==4.38.0 fsspec==2021.6.0 future==0.18.2 gcsfs==0.8.0 -google-api-core==2.7.1 -google-auth==1.31.0 -google-auth-oauthlib==0.4.4 +google-api-core==2.8.0 +google-auth==2.15.0 +google-auth-oauthlib==0.8.0 google-cloud-bigquery==2.34.3 -google-cloud-bigquery-storage==2.7.0 -google-cloud-core==2.2.3 +google-cloud-bigquery-storage==2.13.1 +google-cloud-core==2.3.2 google-crc32c==1.1.2 google-resumable-media==1.3.0 googleapis-common-protos==1.53.0 grpcio==1.44.0 grpcio-status==1.44.0 +gtfs-realtime-bindings==0.0.7 identify==2.2.10 idna==2.10 iniconfig==1.1.1 @@ -44,39 +53,51 @@ ipython-genutils==0.2.0 ipywidgets==7.6.4 jedi==0.18.0 Jinja2==3.0.1 +json5==0.9.11 jsonschema==3.2.0 jupyter==1.0.0 -jupyterlab==3.4.4 jupyter-client==7.0.2 jupyter-console==6.4.0 jupyter-core==4.7.1 +jupyter-server==1.23.4 +jupyterlab==3.4.4 jupyterlab-pygments==0.1.2 +jupyterlab-server==2.10.3 jupyterlab-widgets==1.0.1 +kiwisolver==1.4.4 libcst==0.3.20 +lxml==4.9.2 MarkupSafe==2.0.1 +matplotlib==3.6.2 matplotlib-inline==0.1.2 mistune==0.8.4 +mizani==0.8.1 multidict==5.1.0 mypy-extensions==0.4.3 +nbclassic==0.4.8 nbclient==0.5.4 nbconvert==6.5.1 nbformat==5.4.0 nest-asyncio==1.5.1 nodeenv==1.6.0 notebook==6.4.12 +notebook_shim==0.2.2 numpy==1.22.0 oauthlib==3.2.1 -packaging==20.9 -pandas==1.1.4 +packaging==22.0 +palettable==3.3.0 +pandas==1.5.2 pandas-gbq==0.14.1 pandocfilters==1.4.3 papermill==2.3.4 parso==0.8.2 pathspec==0.9.0 +patsy==0.5.3 pexpect==4.8.0 pickleshare==0.7.5 +Pillow==9.4.0 platformdirs==2.3.0 -plotnine==0.8.0 +plotnine==0.10.1 pluggy==0.13.1 postmarker==0.18.2 prometheus-client==0.11.0 @@ -104,14 +125,20 @@ regex==2021.8.28 requests==2.25.1 requests-oauthlib==1.3.0 rsa==4.7.2 +scipy==1.10.0 Send2Trash==1.8.0 -git+https://github.com/machow/siuba.git@stable +siuba==0.0.25 six==1.16.0 +sniffio==1.3.0 +soupsieve==2.3.2.post1 SQLAlchemy==1.3.24 +sqlalchemy-bigquery==1.5.0 +statsmodels==0.13.5 tenacity==8.0.1 terminado==0.12.1 testpath==0.5.0 textwrap3==0.9.2 +tinycss2==1.2.1 toml==0.10.2 tomli==1.2.1 tornado==6.1 @@ -124,5 +151,6 @@ urllib3==1.26.5 virtualenv==20.4.7 wcwidth==0.2.5 webencodings==0.5.1 +websocket-client==1.4.2 widgetsnbextension==3.5.1 yarl==1.6.3 ```

I get the following error when running make MONTH= all

ModuleNotFoundError: No module named 'siuba.sql.dialects.bigquery' ``` Exception encountered at "In [7]": --------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) /tmp/ipykernel_20/2135866344.py in 4 ## start/end dates now dt.date, need to format downstream?... 5 ## collect here for 1 less query, small table after all ----> 6 tbl_dim_feeds = (tbl.views.gtfs_schedule_dim_feeds() 7 >> filter_end 8 >> filter_itp ~/venv/lib/python3.8/site-packages/calitp/tables.py in __call__(self) 79 80 def __call__(self): ---> 81 return self._create_table() 82 83 def _create_table(self): ~/venv/lib/python3.8/site-packages/calitp/tables.py in _create_table(self) 82 83 def _create_table(self): ---> 84 return LazyTbl(self.engine, self.table_name) 85 86 def _row_html(self, col): ~/venv/lib/python3.8/site-packages/siuba/sql/verbs.py in __init__(self, source, tbl, columns, ops, group_by, order_by, funcs, rm_attr, call_sub_attr, dispatch_cls, result_cls) 213 214 dialect = self.source.dialect.name --> 215 self.funcs = get_dialect_funcs(dialect) if funcs is None else funcs 216 self.dispatch_cls = get_sql_classes(dialect) if dispatch_cls is None else dispatch_cls 217 self.result_cls = result_cls ~/venv/lib/python3.8/site-packages/siuba/sql/utils.py in get_dialect_funcs(name) 3 def get_dialect_funcs(name): 4 #dialect = engine.dialect.name ----> 5 mod = importlib.import_module('siuba.sql.dialects.{}'.format(name)) 6 return mod.funcs 7 /usr/local/lib/python3.8/importlib/__init__.py in import_module(name, package) 125 break 126 level += 1 --> 127 return _bootstrap._gcd_import(name[level:], package, level) 128 129 /usr/local/lib/python3.8/importlib/_bootstrap.py in _gcd_import(name, package, level) /usr/local/lib/python3.8/importlib/_bootstrap.py in _find_and_load(name, import_) /usr/local/lib/python3.8/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'siuba.sql.dialects.bigquery' ```

I'm not sure why these package updates would cause this error. The `pip list diff between downgrading matlab and upgrading plotnine with minimal package updates is:

working < not working diff ```diff diff working-matlab-3.5.3.txt broken-plotnine-10-minimal-ups.txt 12a13 > backports.zoneinfo 0.2.1 21a23 > contourpy 1.0.6 26d27 < descartes 1.1.0 35,37c36,38 < google-api-core 2.7.1 < google-auth 1.31.0 < google-auth-oauthlib 0.4.4 --- > google-api-core 2.8.0 > google-auth 2.15.0 > google-auth-oauthlib 0.8.0 39,40c40,41 < google-cloud-bigquery-storage 2.7.0 < google-cloud-core 2.2.3 --- > google-cloud-bigquery-storage 2.13.1 > google-cloud-core 2.3.2 72c73 < matplotlib 3.5.3 --- > matplotlib 3.6.2 75c76 < mizani 0.7.3 --- > mizani 0.8.1 88c89 < packaging 20.9 --- > packaging 22.0 90c91 < pandas 1.1.4 --- > pandas 1.5.2 102c103 < plotnine 0.8.0 --- > plotnine 0.10.1 139c140 < statsmodels 0.13.1 --- > statsmodels 0.13.5 ```

Additionally, worth noting, if I try and upgrade to a more recent version of siuba and calitp I get, when running make MONTH=12 all:

unnest errors ```sql DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 No matching signature for operator IN UNNEST for argument types: DATE, ARRAY at [6:38] Location: us-west2 Job ID: 5d14922f-68aa-4838-8091-9ad5fa2ad28f [SQL: SELECT `anon_1`.`name`, `anon_1`.`calitp_extracted_at` FROM (SELECT `anon_2`.`calitp_itp_id` AS `calitp_itp_id`, `anon_2`.`calitp_url_number` AS `calitp_url_number`, `anon_2`.`calitp_extracted_at` AS `calitp_extracted_at`, `anon_2`.`full_path` AS `full_path`, `anon_2`.`name` AS `name`, `anon_2`.`size` AS `size`, `anon_2`.`md5_hash` AS `md5_hash`, `anon_2`.`is_loadable_file` AS `is_loadable_file`, `anon_2`.`prev_md5_hash` AS `prev_md5_hash`, `anon_2`.`is_changed` AS `is_changed`, `anon_2`.`is_first_extraction` AS `is_first_extraction`, `anon_2`.`is_validation` AS `is_validation`, `anon_2`.`is_agency_changed` AS `is_agency_changed` FROM (SELECT `gtfs_schedule_history.calitp_files_updates`.`calitp_itp_id` AS `calitp_itp_id`, `gtfs_schedule_history.calitp_files_updates`.`calitp_url_number` AS `calitp_url_number`, `gtfs_schedule_history.calitp_files_updates`.`calitp_extracted_at` AS `calitp_extracted_at`, `gtfs_schedule_history.calitp_files_updates`.`full_path` AS `full_path`, `gtfs_schedule_history.calitp_files_updates`.`name` AS `name`, `gtfs_schedule_history.calitp_files_updates`.`size` AS `size`, `gtfs_schedule_history.calitp_files_updates`.`md5_hash` AS `md5_hash`, `gtfs_schedule_history.calitp_files_updates`.`is_loadable_file` AS `is_loadable_file`, `gtfs_schedule_history.calitp_files_updates`.`prev_md5_hash` AS `prev_md5_hash`, `gtfs_schedule_history.calitp_files_updates`.`is_changed` AS `is_changed`, `gtfs_schedule_history.calitp_files_updates`.`is_first_extraction` AS `is_first_extraction`, `gtfs_schedule_history.calitp_files_updates`.`is_validation` AS `is_validation`, `gtfs_schedule_history.calitp_files_updates`.`is_agency_changed` AS `is_agency_changed` FROM `gtfs_schedule_history.calitp_files_updates`) AS `anon_2` WHERE `anon_2`.`calitp_itp_id` = %(calitp_itp_id_1:INT64)s AND `anon_2`.`calitp_url_number` = %(calitp_url_number_1:INT64)s) AS `anon_1` WHERE `anon_1`.`calitp_extracted_at` IN UNNEST(%(calitp_extracted_at_1:STRING)s)] [parameters: {'calitp_itp_id_1': 10, 'calitp_url_number_1': 0, 'calitp_extracted_at_1': ['2022-12-04', '2022-12-18']}] (Background on this error at: https://sqlalche.me/e/14/4xp6) make: *** [Makefile:22: outputs/2022/12/10/index.ipynb] Error 1 ```

This PR works, however it would be good to identify an upgrade path for the report generation packages. I create an issue to test PRs #205 for future changes.

Type of change

How has this been tested?

Tested locally through the docker container.

acouch commented 1 year ago

This was resolved by dropping plotnine and the notebook for the updated v2 app.