feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.46k stars 976 forks source link

Bug: Regression in BigQuery offline store caused by newer pydantic versions #4280

Open galen-ft opened 2 months ago

galen-ft commented 2 months ago

Expected Behavior

Calls to store = FeatureStore(repo_path=repo_path) for a BigQuery offline store should just work.

Current Behavior

Running store = FeatureStore(repo_path=repo_path) for a BigQuery offline store causes:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.ValidationInfo' object is not subscriptable

This happens on line 108 in bigquery.py#L108.

Likely cause

pydantic>=2.0.0 changed the ValidationInfo object.

The setup.py for feast >= v0.36.0 requires that pydantic>=2.0.0.

Last known successful configuration:

Steps to reproduce

Example feature_store.yaml:

project: my_feature_repo
registry: gs://.../registry.db
# The provider AWS used for the online store.
# Mixing AWS and GCP should be okay provided that offline store type is specified.
provider: aws
offline_store:
    type: bigquery
    billing_project_id: some-project-id
    dataset: latest
    gcs_staging_location: gs://...
    location: EU
    project_id: some-project-id
online_store:
  type: redis
  connection_string: ...

entity_key_serialization_version: 2
import os
# Credentials for GCP
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="GCP_keyfile.json"
from feast import FeatureStore, RepoConfig

if __name__ == "__main__":
    # repo_path must contain the GCP_keyfile.json and feature_store.yaml files
    repo_path = "./"
    assert os.path.exists(repo_path)
    # Create a feature store object
    store = FeatureStore(repo_path=repo_path) # <--- Error occurs here

Specifications

Possible Solution

On first try, using values.data["project_id"] instead of values["project_id"] in the pydantic.field_validator for billing_project_id in bigquery.py#L108 should work:

@field_validator("billing_project_id")
    def project_id_exists(cls, v, values, **kwargs):
        # if v and not values["project_id"]:
        if v and not values.data["project_id"]:
            raise ValueError(
                "please specify project_id if billing_project_id is specified"
            )
        return v

More testing is needed before I can make a PR.

charlieviettq commented 2 months ago

the same issue with feast v0.39.0

tokoko commented 2 months ago

@galen-ft Let me know if you're planning to open a PR for this