data-for-change / anyway-etl

MIT License
0 stars 7 forks source link

research and mitigate error in waze get-data: KeyError: 'pubMillis' #15

Open OriHoch opened 2 years ago

OriHoch commented 2 years ago

most of the time it works but occassionaly we get an email alert for this error

since Nov 17, 17:50 it happened twice on Nov 17 05:25 and 05:55 (UTC)

/srv/pip_install_deps.sh && /usr/local/lib/anyway-etl/bin/anyway-etl waze get-data
[2021-11-18 06:00:01,281] {bash.py:169} INFO - Output:
[2021-11-18 06:00:01,283] {bash.py:173} INFO - {}
[2021-11-18 06:00:01,770] {bash.py:173} INFO - 1
[2021-11-18 06:00:01,773] {bash.py:173} INFO - 1
[2021-11-18 06:00:03,515] {bash.py:173} INFO - 2021-11-18 06:00:03 DEBUG    Starting new HTTPS connection (1): il-georss.waze.com:443
[2021-11-18 06:00:04,999] {bash.py:173} INFO - /usr/local/lib/anyway-etl/lib/python3.8/site-packages/urllib3/connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'il-georss.waze.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
[2021-11-18 06:00:05,000] {bash.py:173} INFO -   warnings.warn(
[2021-11-18 06:00:05,000] {bash.py:173} INFO - 2021-11-18 06:00:04 DEBUG    https://il-georss.waze.com:443 "GET /rtserver/web/TGeoRSS?format=JSON&tk=ccp_partner&ccp_partner_name=The+Public+Knowledge+Workshop&types=traffic%2Calerts%2Cirregularities&polygon=34.123%2C31.4%3B34.722%2C33.004%3B35.793%2C33.37%3B35.914%2C32.953%3B35.765%2C32.733%3B35.6%2C32.628%3B35.473%2C31.073%3B35.23%2C30.29%3B34.985%2C29.513%3B34.898%2C29.483%3B34.123%2C31.4 HTTP/1.1" 200 None
[2021-11-18 06:00:06,279] {bash.py:173} INFO - Traceback (most recent call last):
[2021-11-18 06:00:06,279] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
[2021-11-18 06:00:06,279] {bash.py:173} INFO -     return self._engine.get_loc(casted_key)
[2021-11-18 06:00:06,279] {bash.py:173} INFO -   File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
[2021-11-18 06:00:06,280] {bash.py:173} INFO - KeyError: 'pubMillis'
[2021-11-18 06:00:06,280] {bash.py:173} INFO - 
[2021-11-18 06:00:06,280] {bash.py:173} INFO - The above exception was the direct cause of the following exception:
[2021-11-18 06:00:06,280] {bash.py:173} INFO - 
[2021-11-18 06:00:06,280] {bash.py:173} INFO - Traceback (most recent call last):
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/bin/anyway-etl", line 33, in <module>
[2021-11-18 06:00:06,280] {bash.py:173} INFO -     sys.exit(load_entry_point('anyway-etl', 'console_scripts', 'anyway-etl')())
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
[2021-11-18 06:00:06,280] {bash.py:173} INFO -     return self.main(*args, **kwargs)
[2021-11-18 06:00:06,280] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 1062, in main
[2021-11-18 06:00:06,280] {bash.py:173} INFO -     rv = self.invoke(ctx)
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 1668, in invoke
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return _process_result(sub_ctx.command.invoke(sub_ctx))
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return ctx.invoke(self.callback, **ctx.params)
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/click/core.py", line 763, in invoke
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return __callback(*args, **kwargs)
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/cli.py", line 14, in get_data
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     get_waze_data()
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/get_data.py", line 15, in get_waze_data
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     dataflows = dataflows_handler.get_dataflows(waze_data)
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/utils/dataflows_handler.py", line 12, in get_dataflows
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return [build_dataflow(waze_data, field) for field in FIELDS]
[2021-11-18 06:00:06,281] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/utils/dataflows_handler.py", line 12, in <listcomp>
[2021-11-18 06:00:06,281] {bash.py:173} INFO -     return [build_dataflow(waze_data, field) for field in FIELDS]
[2021-11-18 06:00:06,282] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/utils/dataflow_builder.py", line 22, in build_dataflow
[2021-11-18 06:00:06,282] {bash.py:173} INFO -     items = self.get_items(waze_data, field)
[2021-11-18 06:00:06,282] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/utils/dataflow_builder.py", line 17, in get_items
[2021-11-18 06:00:06,282] {bash.py:173} INFO -     parsed_data = parser(raw_data)
[2021-11-18 06:00:06,282] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/src/anyway-etl/anyway_etl/waze/utils/parser_retriever.py", line 67, in _parse_jams
[2021-11-18 06:00:06,282] {bash.py:173} INFO -     jams_df["created_at"] = pd.to_datetime(jams_df["pubMillis"], unit="ms")
[2021-11-18 06:00:06,282] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/pandas/core/frame.py", line 3455, in __getitem__
[2021-11-18 06:00:06,282] {bash.py:173} INFO -     indexer = self.columns.get_loc(key)
[2021-11-18 06:00:06,282] {bash.py:173} INFO -   File "/usr/local/lib/anyway-etl/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
[2021-11-18 06:00:06,282] {bash.py:173} INFO -     raise KeyError(key) from err
[2021-11-18 06:00:06,282] {bash.py:173} INFO - KeyError: 'pubMillis'
[2021-11-18 06:00:06,511] {bash.py:177} INFO - Command exited with return code 1