GoogleCloudPlatform / analytics-componentized-patterns

Apache License 2.0
174 stars 98 forks source link

"SyntaxError: --params is not a correctly formatted JSON string or a JSON serializable dictionary" When running LTV lookalike notebook #71

Open rain-ml opened 2 years ago

rain-ml commented 2 years ago

Hi all,

I'm running the LTV lookalike notebook: https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/retail/ltv/bqml/notebooks/bqml_automl_ltv_activate_lookalike.ipynb on a Vertex AI user-managed notebook, but I got the error as written in the title.

I got the error when running the first cell of the Aggregate per day per customer chapter, i.e. the cell starting with this code:

%%bigquery --params $LTV_PARAMS --project $PROJECT_ID

DECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;
DECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY;

CREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS
SELECT
  customer_id,
  order_day,
  ROUND(day_value_after_returns, 2) AS value,
  day_qty_after_returns as qty_articles,
  day_num_returns AS num_returns,
  CEIL(avg_time_to_return) AS time_to_return
FROM (

.....

Does anyone know how to fix this?

Here is the full error message I got:


Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 515, in _cell_magic
    params_option_value, rest_of_args = _split_args_line(line)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 720, in _split_args_line
    tree = scanner.input_line()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 203, in input_line
    options = self.option_list()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 264, in option_list
    option = self.params_option()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 319, in params_option
    opt_value = self.py_dict()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 333, in py_dict
    dict_items = self.dict_items()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 347, in dict_items
    item = self.dict_item()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 371, in dict_item
    value = self.py_value()

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 423, in py_value
    self.error(msg, exc_type=QueryParamsParseError)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/line_arg_parser/parser.py", line 193, in error
    raise exc_type(message)

QueryParamsParseError: Unexpected token type UNKNOWN at position 27.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "/tmp/ipykernel_17444/3535280929.py", line 1, in <module>
    get_ipython().run_cell_magic('bigquery', '--params $LTV_PARAMS --project $PROJECT_ID', '\nDECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;\nDECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY;\n\nCREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS\nSELECT\n  customer_id,\n  order_day,\n  ROUND(day_value_after_returns, 2) AS value,\n  day_qty_after_returns as qty_articles,\n  day_num_returns AS num_returns,\n  CEIL(avg_time_to_return) AS time_to_return\nFROM (\n  SELECT\n    customer_id,\n    order_day,\n    SUM(order_value_after_returns) AS day_value_after_returns,\n    STDDEV(SUM(order_value_after_returns)) OVER(PARTITION BY customer_id ORDER BY SUM(order_value_after_returns)) AS stdv_value,\n    SUM(order_qty_after_returns) AS day_qty_after_returns,\n    STDDEV(SUM(order_qty_after_returns)) OVER(PARTITION BY customer_id ORDER BY SUM(order_qty_after_returns)) AS stdv_qty,\n    CASE\n      WHEN MIN(order_min_qty) < 0 THEN count(1)\n      ELSE 0\n    END AS day_num_returns,\n    CASE\n      WHEN MIN(order_min_qty) < 0 THEN AVG(time_to_return)\n      ELSE NULL\n    END AS avg_time_to_return\n  FROM (\n    SELECT \n      customer_id,\n      order_id,\n      -- Gives the order date vs return(s) dates.\n      MIN(transaction_date) AS order_day,\n      MAX(transaction_date) AS return_final_day,\n      DATE_DIFF(MAX(transaction_date), MIN(transaction_date), DAY) AS time_to_return,\n      -- Aggregates all products in the order \n      -- and all products returned later.\n      SUM(qty * unit_price) AS order_value_after_returns,\n      SUM(qty) AS order_qty_after_returns,\n      -- If negative, order has qty return(s).\n      MIN(qty) order_min_qty\n    FROM \n      `ltv_ecommerce.10_orders`\n    GROUP BY\n      customer_id,\n      order_id)\n  GROUP BY\n    customer_id,\n    order_day)\nWHERE\n  -- [Optional] Remove dates with outliers per a customer.\n  (stdv_value < MAX_STDV_MONETARY\n    OR stdv_value IS NULL) AND\n  (stdv_qty < MAX_STDV_QTY\n    OR stdv_qty IS NULL);\n\n\nSELECT * FROM `ltv_ecommerce.20_aggred` LIMIT 5;\n')

  File "/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2419, in run_cell_magic
    result = fn(*args, **kwargs)

  File "/opt/conda/lib/python3.7/site-packages/google/cloud/bigquery/magics/magics.py", line 521, in _cell_magic
    raise rebranded_error from exc

  File "<string>", line unknown
SyntaxError: --params is not a correctly formatted JSON string or a JSON serializable dictionary
polong-lin commented 2 years ago

Hi rain-ml, It looks like you may not have run one of the prior cells that creates the LTV_PARAMS dictionary:

LTV_PARAMS = {
    'WINDOW_LENGTH': 0,
    'WINDOW_STEP': 30,
    'WINDOW_STEP_INITIAL': 90,
    'LENGTH_FUTURE': 30,
    'MAX_STDV_MONETARY': 500,
    'MAX_STDV_QTY': 100, 
    'TOP_LTV_RATIO': 0.2
}
LTV_PARAMS

In other words, in the first line of your code, %%bigquery is expecting to import a dictionary which passes in variables that can then be used in the SQL code below it. This is why your output says:

--params is not a correctly formatted JSON string or a JSON serializable dictionary

Once you create the LTV_PARAMS dictionary and use it as part of %%bigquery, it should then allow you to reference the two variables @MAX_STDV_MONETARY @MAX_STDV_QTY which were set in LTV_PARAMS above:

DECLARE MAX_STDV_MONETARY INT64 DEFAULT @MAX_STDV_MONETARY;
DECLARE MAX_STDV_QTY INT64 DEFAULT @MAX_STDV_QTY; 

If you don't like passing in LTV_PARAMS, you can also remove it and replace it with values directly, such as:

%%bigquery --project $PROJECT_ID

DECLARE MAX_STDV_MONETARY INT64 DEFAULT 500;
DECLARE MAX_STDV_QTY INT64 DEFAULT 100;

CREATE OR REPLACE TABLE `ltv_ecommerce.20_aggred` AS
SELECT
  customer_id,
  order_day,
  ROUND(day_value_after_returns, 2) AS value,
  day_qty_after_returns as qty_articles,
  day_num_returns AS num_returns,
  CEIL(avg_time_to_return) AS time_to_return
FROM (

.....

Hope this helps

rain-ml commented 2 years ago

Hi polong-lin,

I did run the LTV_PARAMS cell before referencing it and that's how I got the original error.

In fact, If I didn't run the LTV_PARAMS cell, here's the error message: NameError: Parameter expansion failed, undefined variable "LTV_PARAMS".

Anyways, I just hard code it for now like you suggested without using params. Thanks!