GoogleCloudPlatform / vertex-ai-samples

Notebooks, code samples, sample apps, and other resources that demonstrate how to use, develop and manage machine learning and generative AI workflows using Google Cloud Vertex AI.
https://cloud.google.com/vertex-ai
Apache License 2.0
1.54k stars 780 forks source link

Online prediction with BigQuery ML: FailedPrecondition: 400 Model <something> is not exportable from BigQueryML. #1282

Closed NelsonFrancisco closed 8 months ago

NelsonFrancisco commented 1 year ago

Expected Behavior

To run the tutorial successfully, and the step "model.deploy(endpoint=endpoint)" to actually deploy the model to the newly created endpoint

Actual Behavior

This error happens:

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     56         try:
---> 57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.FAILED_PRECONDITION
    details = "Model projects/<some_number>/locations/europe-west2/models/plans_copy_very_dumb_model@2 is not exportable from BigQueryML."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.140.95:443 {created_time:"2022-11-22T17:20:11.595548603+00:00", grpc_status:9, grpc_message:"Model projects/<some_number>/locations/europe-west2/models/plans_copy_very_dumb_model@2 is not exportable from BigQueryML."}"
>

The above exception was the direct cause of the following exception:

FailedPrecondition                        Traceback (most recent call last)
/tmp/ipykernel_1/3081084994.py in <module>
      1 #deploying the model to the endpoint may take 10-15 minutes
----> 2 model.deploy(endpoint=endpoint)

~/.local/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in deploy(self, endpoint, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, encryption_spec_key_name, network, sync, deploy_request_timeout, autoscaling_target_cpu_utilization, autoscaling_target_accelerator_duty_cycle)
   3316             deploy_request_timeout=deploy_request_timeout,
   3317             autoscaling_target_cpu_utilization=autoscaling_target_cpu_utilization,
-> 3318             autoscaling_target_accelerator_duty_cycle=autoscaling_target_accelerator_duty_cycle,
   3319         )
   3320 

~/.local/lib/python3.7/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
    808                 if self:
    809                     VertexAiResourceNounWithFutureManager.wait(self)
--> 810                 return method(*args, **kwargs)
    811 
    812             # callbacks to call within the Future (in same Thread)

~/.local/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy(self, endpoint, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, encryption_spec_key_name, network, sync, deploy_request_timeout, autoscaling_target_cpu_utilization, autoscaling_target_accelerator_duty_cycle)
   3489             deploy_request_timeout=deploy_request_timeout,
   3490             autoscaling_target_cpu_utilization=autoscaling_target_cpu_utilization,
-> 3491             autoscaling_target_accelerator_duty_cycle=autoscaling_target_accelerator_duty_cycle,
   3492         )
   3493 

~/.local/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy_call(cls, api_client, endpoint_resource_name, model, endpoint_resource_traffic_split, network, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, deploy_request_timeout, autoscaling_target_cpu_utilization, autoscaling_target_accelerator_duty_cycle)
   1232             traffic_split=traffic_split,
   1233             metadata=metadata,
-> 1234             timeout=deploy_request_timeout,
   1235         )
   1236 

~/.local/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/endpoint_service/client.py in deploy_model(self, request, endpoint, deployed_model, traffic_split, retry, timeout, metadata)
   1261             retry=retry,
   1262             timeout=timeout,
-> 1263             metadata=metadata,
   1264         )
   1265 

/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
    152             kwargs["metadata"] = metadata
    153 
--> 154         return wrapped_func(*args, **kwargs)
    155 
    156 

/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:
---> 59             raise exceptions.from_grpc_error(exc) from exc
     60 
     61     return error_remapped_callable

FailedPrecondition: 400 Model projects/<some_number>/locations/europe-west2/models/plans_copy_very_dumb_model@2 is not exportable from BigQueryML.

The region of the Workbench machine is europe-west1.

The BigQuery dataset, the VertexAI model that is created and stored in the registry during the tutorial, and the VertexAI endpoint are in europe-west2

Is it maybe because of region/location mismatches/conflicts?

Steps to Reproduce the Problem

  1. Create a new VertexAI workbench
  2. Follow the tutorial https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/bigquery_ml/bqml-online-prediction.ipynb
  3. When deploying the model to the endpoint, I get the error

Very important note: I cannot follow the original tutorial, because I get an error when creating the model:

NotFound: 404 POST https://bigquery.googleapis.com/bigquery/v2/projects/<some_name>/jobs?prettyPrint=false: Not found: Dataset <some_name>:ga4_churnprediction was not found in location US

Because of this, my model is a custom one, very simple:

BQML_MODEL_NAME = f"plans_copy_very_dumb_model"

sql_train_model_bqml = f"""
CREATE OR REPLACE MODEL {BQ_DATASET_NAME}.{BQML_MODEL_NAME}    
OPTIONS(
  MODEL_TYPE="LOGISTIC_REG",
  input_label_cols=["id"], -- instead of setting the proper labels as in the tutorial
  model_registry="vertex_ai",
  vertex_ai_model_version_aliases=['logistic_reg', 'experimental']
) AS

SELECT
  *
FROM
  `<some_name>.mongo_atlas_dev.plans_copy` -- instead of fetching data from "bqmlpublic.demo_ga4churnprediction.training_data"
"""

print(sql_train_model_bqml)

run_bq_query(sql_train_model_bqml)

Specifications

Running exclusively on Google Cloud Console. But here's the info I can gather.

amandafbri commented 1 year ago

I am having the same issue here. Region for the model "us-central1" and "us-west1" for workbench.

WolakT commented 11 months ago

Recently I've experienced similar issue which ended with the same error. The issue in my case was not connected to region. Upon long investigation we have discovered that if you train a BigQuery Model and pass a column that is of 'TIMESTAMP' type during the training you will have problems deploying the model to the Vertex AI Endpoint and receive the 'is not exportable' error. @polong-lin are you sure the timestamp datatype is not the real issue here? The issue is easy to reproduce just change the query from the notebook and add the time stamp column 'user_first_engagement' (instead of excluding it) for the model training. Even if you find the model in the Vertrex AI registry it will not be deployable to endpoint. If you try to deploy from the SDK you will receive the 'is not exportable' error. Therefore I think the notebook is fine but @NelsonFrancisco is probably providing some timestamp columns during the training which is causing this issue. image image

polong-lin commented 11 months ago

Hm, it's not clear from the original post if TIMESTAMP was used and that the initial goal was for model export, but you are absolutely correct that if TIMESTAMP is used as a column, then it can not be exportable: https://cloud.google.com/bigquery/docs/exporting-models#limitations

If this is a blocker for you, @WolakT, you could try to use a TRANSFORM clause the transform a TIMESTAMP column before the model uses it for training (and inference), so long as the resulting data type is no longer a TIMESTAMP format.

If you still run into issues or blockers that can't be resolved regarding model export, could you share more details directly with the product team at bqml-feedback[at]google.com?

andrewferlitsch commented 8 months ago

This notebook is known to be flaky

andrewferlitsch commented 8 months ago

Notebook was deprecated (migrated to community) and no longer supported.