aerosense-ai / data-gateway

Data influx for Aerosense.
https://www.aerosense.ai/
Other
3 stars 1 forks source link

google.api_core.exceptions.GoogleAPICallError: 413 POST in the cloud Function #45

Open time-trader opened 2 years ago

time-trader commented 2 years ago

Bug report

What is the current behavior?

Executing cloud function with windows of window_size=60 (sec) and with Diff_Baros or Baros Sensor Data fails with following error:

  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 171, in view_func
    function(data, context)
  File "/workspace/main.py", line 41, in upload_window
    window_handler.persist_window(unix_timestamped_window["sensor_data"], window_metadata)
  File "/workspace/window_handler.py", line 99, in persist_window
    self.dataset.add_sensor_data(
  File "/workspace/big_query.py", line 80, in add_sensor_data
    errors = self.client.insert_rows(table=self.client.get_table(self.table_names["sensor_data"]), rows=rows)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 3411, in insert_rows
    return self.insert_rows_json(table, json_rows, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 3590, in insert_rows_json
    response = self._call_api(
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 760, in _call_api
    return call()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 480, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.GoogleAPICallError: 413 POST https://bigquery.googleapis.com/bigquery/v2/projects/aerosense-twined/datasets/greta/tables/sensor_data/insertAll?prettyPrint=false: <!DOCTYPE html>

This might be related to JSON file being too big? However checking the ingress bucket shows that all files are below 20 MB

thclark commented 2 years ago

60 minutes is quite long for a window. Can you stick to 10 minutes? If so then problem solved ;)

-- Sent from a mobile device; please forgive abrupt statements and 'auto-fill' mistakes!

Tom Clark 0776 453 0870

On 21 Feb 2022, at 23:31, time-trader @.***> wrote:

 Bug report

What is the current behavior?

Executing cloud function with windows of window_size=60 and with Diff_Baros or Baros Sensor Data fails with following error:

File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app response = self.full_dispatch_request() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request rv = self.handle_user_exception(e) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request rv = self.dispatch_request() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(req.view_args) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/init.py", line 171, in view_func function(data, context) File "/workspace/main.py", line 41, in upload_window window_handler.persist_window(unix_timestamped_window["sensor_data"], window_metadata) File "/workspace/window_handler.py", line 99, in persist_window self.dataset.add_sensor_data( File "/workspace/big_query.py", line 80, in add_sensor_data errors = self.client.insert_rows(table=self.client.get_table(self.table_names["sensor_data"]), rows=rows) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 3411, in insert_rows return self.insert_rows_json(table, json_rows, kwargs) File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 3590, in insert_rows_json response = self._call_api( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 760, in _call_api return call() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func return retry_target( File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target return target() File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/_http/init.py", line 480, in api_request raise exceptions.from_http_response(response) google.api_core.exceptions.GoogleAPICallError: 413 POST https://bigquery.googleapis.com/bigquery/v2/projects/aerosense-twined/datasets/greta/tables/sensor_data/insertAll?prettyPrint=false: <!DOCTYPE html> ```

This might be related to JSON file being too big? However checking the ingress bucket shows that all files are below 20 MB — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

time-trader commented 2 years ago

Its 60 seconds

cortadocodes commented 2 years ago

This might be related to JSON file being too big? However checking the ingress bucket shows that all files are below 20 MB

This refers to Dataflow rather than BigQuery but a 413 does suggest a payload that's too big. Do you have an idea of how big each batch is?

time-trader commented 2 years ago

The size of json window files in the ingress bucket: aerosense-ingress-eu/data_gateway/790791 is anywhere between 1 Kb and 12 MB, I don't know if the payload is the same size...

time-trader commented 2 years ago

With more testing: window_size = 60s, the error occurs either for diffBaros or for Baros sensor data. window_size = 30s, this error happens only with diffBaros sensor data, but NOT with Baros. NOTE: Batches with diffBaros data are actually smaller than ones with Baros. SO I assume the error has to do more with frequency (diffBaros are sampled much more frequently) and how the batch is processed and inserted into BigQuerry. I noticed that when I export BigQuerry request I get JSON lines, with lots of repeated data like sensor name etc, for every sample...

With window_size = 20s, This error does not occur.