feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.48k stars 977 forks source link

Remote apply #4529

Open dmartinol opened 3 days ago

dmartinol commented 3 days ago

Expected Behavior

Running feast apply from a client connected using remote proxies should apply the python definitions.

Current Behavior

The apply fails. With a project using the postgres template and a deployment on kind using #4528 it throws the following error:

% feast feature-views list
NAME                         ENTITIES    TYPE
driver_hourly_stats_fresh    {'driver'}  FeatureView
driver_hourly_stats          {'driver'}  FeatureView
transformed_conv_rate_fresh  {'driver'}  OnDemandFeatureView
transformed_conv_rate        {'driver'}  OnDemandFeatureView

% feast apply
...
  File "/Users/dmartino/.pyenv/versions/3.11.9/bin/feast", line 8, in <module>
    sys.exit(cli())
             ^^^^^
...
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/repo_operations.py", line 355, in apply_total
    apply_total_with_repo_instance(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/repo_operations.py", line 313, in apply_total_with_repo_instance
    store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/feature_store.py", line 903, in apply
    self._make_inferences(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/feature_store.py", line 627, in _make_inferences
    update_feature_views_with_inferred_features_and_entities(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/inference.py", line 174, in update_feature_views_with_inferred_features_and_entities
    _infer_features_and_entities(
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/inference.py", line 212, in _infer_features_and_entities
    table_column_names_and_types = fv.batch_source.get_table_column_names_and_types(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/offline_stores/contrib/postgres_offline_store/postgres_source.py", line 113, in get_table_column_names_and_types
    with _get_conn(config.offline_store) as conn, conn.cursor() as cur:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/utils/postgres/connection_utils.py", line 18, in _get_conn
    conninfo=_get_conninfo(config),
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/feast/infra/utils/postgres/connection_utils.py", line 60, in _get_conninfo
    "user": config.user,
            ^^^^^^^^^^^
  File "/Users/dmartino/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pydantic/main.py", line 811, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'RemoteOfflineStoreConfig' object has no attribute 'user'

Reference to local feature_store.yaml:

project: sample
registry:
  path: localhost:8001
  registry_type: remote
offline_store:
  host: localhost
  port: 8002
  type: remote
online_store:
  path: http://localhost:8003
  type: remote
entity_key_serialization_version: 2
auth:
  type: no_auth

Forwarded ports for all remote services to local ports 8001-8003 (see working example of feast feature-views list).

Steps to reproduce

Follow steps of PR #4528 and then run feast apply from the client folder (after copying the example_repo.py)

tokoko commented 3 days ago

right, makes sense. this is similar to #4186. Basically, we need to move all DataSource methods that actually need to touch the datasets from DataSource to OfflineStore. I did this only for valiadate before, now we need to do the same for get_table_column_names_and_types.

dmartinol commented 2 days ago

I see that validate_data_source is a static method in OfflineStore interface. I assume we want to make it an instance method and re-implement if in the OfflineServer class, right? together with the other new method get_table_column_names_and_types, of course.

This should not require a big effort, even if the Arrow Flight server was not exactly design to serve such purposes but rather suited for "efficient data transport". Shouldn't we make it a multi-protocol application instead and delegate some endpoints unrelated to data-transport to a traditional REST or grpc server?

tokoko commented 2 days ago

This should not require a big effort, even if the Arrow Flight server was not exactly design to serve such purposes but rather suited for "efficient data transport". Shouldn't we make it a multi-protocol application instead and delegate some endpoints unrelated to data-transport to a traditional REST or grpc server?

imho that would end up being more complicated than adding a couple of non-data "endpoints" to a flight server,

dmartinol commented 2 days ago

I will start evaluating this one.

tokoko commented 2 days ago

I think we can start by requiring data source read permissions for both.