feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.49k stars 978 forks source link

Feature view does not exist error in on-demand feature view #4363

Open ndenStanford opened 2 months ago

ndenStanford commented 2 months ago

I am trying to run the on-demand feature view but not able to execute it correctly. The example section on this manual page also does not provide a valid link as shown below.

See https://github.com/feast-dev/on-demand-feature-views-demo for an example on how to use on-demand feature views.

Any solution or suggestion is greatly appreciated.

Expected Behavior

One-demand feature successfully generates the new feature.

Current Behavior

One-demand feature does not work when running "get_historical_features" encountering the following error

Exception has occurred: FeatureViewNotFoundException
Feature view generate_label_feature_view does not exist

Steps to reproduce

@on_demand_feature_view(
    sources=[
        feature_view 
    ],
    schema=[
        Field(name='topic', dtype=String),
    ]
)
def generate_label_feature_view(features_df: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df['topic'] = [
            generate_label(title, content)
            for title, content in zip(
                features_df["title"].values, features_df["content"].values
            )
        ]
    return df

Specifications

Possible Solution

tokoko commented 2 months ago

@ndenStanford hey, seems like generate_label_feature_view is not in the registry for some reason. How are you applying it to the registry? Did you run feast apply?

ndenStanford commented 1 month ago

Hello @tokoko.

Appreciate your response, and I apologize for the delay in getting back to you. You are absolutely right about that. While the initial issue has been resolved, I'm still having trouble getting the single feature store to work.

Regarding the script from the on-demand feature view, what do driver_hourly_stats and transformed_conv_rate need to have in common? Do they need to share the same entity or entity join key, or do they need to have some common columns?

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

Running my script in the original comment, I am encountering the error ERROR: column reference "topic" is ambiguous. Please see the full log below.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/feast/infra/offline_stores/offline_store.py", line 79, in to_df
    features_df = self._to_df_internal(timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/feast/usage.py", line 299, in wrapper
    raise exc.with_traceback(traceback)
  File "/usr/local/lib/python3.8/site-packages/feast/usage.py", line 288, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/feast/infra/offline_stores/redshift.py", line 432, in _to_df_internal
    return aws_utils.unload_redshift_query_to_df(
  File "/usr/local/lib/python3.8/site-packages/feast/infra/utils/aws_utils.py", line 591, in unload_redshift_query_to_df
    table = unload_redshift_query_to_pa(
  File "/usr/local/lib/python3.8/site-packages/feast/infra/utils/aws_utils.py", line 562, in unload_redshift_query_to_pa
    execute_redshift_query_and_unload_to_s3(
  File "/usr/local/lib/python3.8/site-packages/feast/infra/utils/aws_utils.py", line 543, in execute_redshift_query_and_unload_to_s3
    execute_redshift_statement(
  File "/usr/local/lib/python3.8/site-packages/feast/infra/utils/aws_utils.py", line 174, in execute_redshift_statement
    wait_for_redshift_statement(redshift_data_client, statement)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/feast/infra/utils/aws_utils.py", line 143, in wait_for_redshift_statement
    raise RedshiftQueryError(desc)  # Don't retry. Raise exception.
feast.errors.RedshiftQueryError: Redshift SQL Query failed to finish. Details: {'ClusterIdentifier': 'redshift-dev', 'CreatedAt': datetime.datetime(2024, 7, 29, 17, 28, 45, 64000, tzinfo=tzlocal()), 'Database': 'sources_dev', 'DbUser': 'admin', 'Duration': -1, 'Error': '**ERROR: column reference "topic" is ambiguous'**, 'HasResultSet': False, 'Id': '3a8cd4e9-3fec-4d60-a614-f8cbcb3b2798', 

Is there a way to find a complete example script of the on-demand feature view?

What's the rationale for needing to register an additional feature view? Initially, I thought the on-demand feature would be appended to the original in place.

Thank you in advance.