feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.48k stars 978 forks source link

`feast apply` not registers all features in feature repository #2579

Closed KarolisKont closed 2 years ago

KarolisKont commented 2 years ago

Expected Behavior

When I sync the feature repository using feast apply, all metadata in the feature repository is what I defined in my definitions files.

Current Behavior

This is the Feast objects definition file:

from datetime import timedelta

from feast import BigQuerySource, FeatureView, Field
from feast.types import Float32, Int32, Int64, String

from marketplace_sandbox.config import PROJECT_ID
from marketplace_sandbox.examples.entities import item, portal

entities = [item, portal]

fields = [
    Field(
        name="listing_price_local_currency",
        dtype=Float32,
    ),
    Field(
        name="listing_currency",
        dtype=String,
    ),
    Field(
        name="item_color1_id",
        dtype=Int32,
    ),
    Field(
        name="item_color2_id",
        dtype=Int32,
    ),
    Field(
        name="first_visible_at",
        dtype=Int64,
    ),
    Field(
        name="promoted_until",
        dtype=Int64,
    ),
    Field(
        name="item_catalog_parent_id",
        dtype=Int64,
    ),
    Field(
        name="item_catalog_parent_id_1",
        dtype=Int64,
    ),
    Field(
        name="item_catalog_parent_id_2",
        dtype=Int64,
    ),
    Field(
        name="item_catalog_parent_id_3",
        dtype=Int64,
    ),
    Field(
        name="item_catalog_parent_id_4",
        dtype=Int64,
    ),
    Field(
        name="item_photo_count",
        dtype=Int32,
    ),
    Field(
        name="description_word_count",
        dtype=Int32,
    ),
    Field(
        name="item_id_7d_impressions_3600srt",
        dtype=Int32,
    ),
    Field(
        name="item_id_7d_clicks_3600srt",
        dtype=Int32,
    ),
    Field(
        name="item_id_7d_ctr_3600srt",
        dtype=Float32,
    ),
    Field(
        name="item_views_14d_hourly",
        dtype=Int32,
    ),
    Field(
        name="item_brand_id",
        dtype=Int32,
    ),
    Field(
        name="item_catalog_id",
        dtype=Int32,
    ),
    Field(
        name="item_country_id",
        dtype=Int32,
    ),
    Field(
        name="item_user_id",
        dtype=Int64,
    ),
    Field(
        name="item_language_id",
        dtype=Int32,
    ),
    Field(
        name="item_size_id",
        dtype=Int32,
    ),
    Field(
        name="item_status_id",
        dtype=Int32,
    ),
    Field(
        name="delay_international_visibility_by",
        dtype=Int32,
    ),
]

BQ_DATASET_NAME = "example_item_stats_per_portal"
BQ_TABLE_NAME = "v1"
BQ_TABLE_REFERENCE = f"{PROJECT_ID}.{BQ_DATASET_NAME}.{BQ_TABLE_NAME}"

batch_source = BigQuerySource(
    table=BQ_TABLE_REFERENCE,
    created_timestamp_column="created_timestamp",
    timestamp_field="event_timestamp",
    description=(
        "Example of BigQuery table that contains item features per portal."
    ),
    owner="VMIP",
    tags={},
)

example_item_stats_per_portal_v1_fv = FeatureView(
    name=f"{BQ_DATASET_NAME}_{BQ_TABLE_NAME}",
    entities=[entity.name for entity in entities],
    ttl=timedelta(weeks=52),
    online=True,
    batch_source=batch_source,
    schema=fields,
    description=(
        "Example of feature view that contains item features per portal."
    ),
    owner="VMIP",
    tags={},
)

After using feast apply and retrieving feature view definition by using feast feature-views describe example_item_stats_per_portal_v1 I am getting:

spec:
  name: example_item_stats_per_portal_v1
  entities:
  - example_item
  - example_portal
  features:
  - name: listing_price_local_currency
    valueType: FLOAT
  - name: listing_currency
    valueType: STRING
  - name: item_color1_id
    valueType: INT32
  - name: item_color2_id
    valueType: INT32
  - name: first_visible_at
    valueType: INT64
  - name: promoted_until
    valueType: INT64
  - name: item_catalog_parent_id
    valueType: INT64
  - name: item_catalog_parent_id_2
    valueType: INT64
  - name: item_catalog_parent_id_3
    valueType: INT64
  - name: item_catalog_parent_id_4
    valueType: INT64
  - name: item_photo_count
    valueType: INT32
  - name: description_word_count
    valueType: INT32
  - name: item_id_7d_impressions_3600srt
    valueType: INT32
  - name: item_id_7d_clicks_3600srt
    valueType: INT32
    - name: item_id_7d_ctr_3600srt
    valueType: FLOAT
  - name: item_views_14d_hourly
    valueType: INT32
  - name: delay_international_visibility_by
    valueType: INT32
  ttl: 31449600s
...

In definition file have 25 features, when from the feature repository getting only 17. Missing are: ['item_catalog_parent_id_1', 'item_brand_id', 'item_catalog_id', 'item_country_id', 'item_user_id', 'item_language_id', 'item_size_id', 'item_status_id']

Want to mention that I have the same situation (identical registered and missing features) with a similar feature view that is defined using Features instead of Field.

Definition here:

from datetime import timedelta

from feast import BigQuerySource, Feature, FeatureView, ValueType

from marketplace_sandbox.config import BQ_POC_RERANKER_TABLE_REFERENCE
from marketplace_sandbox.entities import item, portal
from marketplace_sandbox.utils import generate_data_bq_query

FEATURE_VIEW_NAME = "item_stats_per_portal_v1"

entities = [item, portal]

features = [
    Feature(
        name="listing_price_local_currency",
        dtype=ValueType.FLOAT,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="listing_currency",
        dtype=ValueType.STRING,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_color1_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_color2_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="first_visible_at",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="promoted_until",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_parent_id",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_parent_id_1",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_parent_id_2",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_parent_id_3",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_parent_id_4",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_photo_count",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="description_word_count",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_id_7d_impressions_3600srt",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_id_7d_clicks_3600srt",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_id_7d_ctr_3600srt",
        dtype=ValueType.FLOAT,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_views_14d_hourly",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_brand_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_catalog_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_country_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_user_id",
        dtype=ValueType.INT64,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_language_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_size_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="item_status_id",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
    Feature(
        name="delay_international_visibility_by",
        dtype=ValueType.INT32,
        labels={"owner": "VMIP"},
    ),
]

batch_source = BigQuerySource(
    name=f"{BQ_POC_RERANKER_TABLE_REFERENCE}.{FEATURE_VIEW_NAME}",
    query=generate_data_bq_query(
        features=features,
        entities=entities,
        tables=[BQ_POC_RERANKER_TABLE_REFERENCE],
    ),
    event_timestamp_column="event_time",
)

item_stats_per_portal_fv = FeatureView(
    name=FEATURE_VIEW_NAME,
    entities=[entity.name for entity in entities],
    ttl=timedelta(weeks=52),
    features=features,
    batch_source=batch_source,
    owner="VMIP",
)

Steps to reproduce

  1. Create a definition file. Can be simillar that was given in Current Behavior.
  2. run feast apply.
  3. describe feature view using feast feature-views describe <FEATRUE VIEW NAME>

Specifications

Possible Solution

KarolisKont commented 2 years ago

Just testest with previously worked Feast version:

feast version
Feast SDK Version: "feast 0.19.1.dev52+ge638f106"

It detects all features from the feature view.

felixwang9817 commented 2 years ago

@KarolisKont thanks for reporting this. I'll take a look ASAP!

felixwang9817 commented 2 years ago

@KarolisKont I'm unable to reproduce this error. Would you mind sharing your feature_store.yaml file, the schema for your BigQuerySource, and your entity definitions? Also, can you confirm that you're using v0.20.0 installed from PyPI, and not installing from source? Thanks!

KarolisKont commented 2 years ago

Ok perhaps I forgot to point out that I use forked repo 😅 , this one (commit 2c2cf79e8b2ec4a78db3087db5882dc22cf56d15). Our forke has only one chage that is this one.

Having bug with this version:

❯ feast version
/Users/karolis/Vinted/projects/vmip-feast-feature-repo/.venv/lib/python3.7/site-packages/feast/entity.py:116: DeprecationWarning: The `join_key` parameter is being deprecated in favor of the `join_keys` parameter. Please switch from using `join_key` to `join_keys`. Feast 0.22 and onwards will not support the `join_key` parameter.
  DeprecationWarning,
Feast SDK Version: "feast 0.20.1.dev20+g2c2cf79e"

After I tried to use from pipy v0.20.0:

❯ feast version
Feast SDK Version: "feast 0.20.0"

And it was working properly 🤦 .

Then I tried to use from pipy v0.20.1:

❯ feast version
Feast SDK Version: "feast 0.20.1"

Also, everything was OK.

After this fetched the latest commits from upstream to our repo:

feast version
Feast SDK Version: "feast 0.20.1.dev30+g219dc34b"

Having the same problem.

KarolisKont commented 2 years ago

Noticed that some changes are only on v0.20-brach.

So made a new branch from master (forkerd repo) - vmip. Then I merged it with v0.20.1' tag, used thatvmip` branch in my project and it still doesn't work 🤔 .

What I want to say is that feast from pipy works but my forked one doesn't and I don't understand why 😅.

KarolisKont commented 2 years ago

Closing this issue that is apparently only on our forked repo, sorry for the false alarm.

felixwang9817 commented 2 years ago

@KarolisKont good to hear! Just curious, do you know what the root cause was?

KarolisKont commented 2 years ago

Didn't looked why it behaves like that - not pushing all feature view features.

@felixwang9817 but tried feast-dev/feast installing master branch Head, it behaves the same way (not selecting all features).

Also noticed that there is a git history difference and I can't merge properly the master branch with v0.20-branch, merging shows weird conflicts, some files don't modify at all - keeping what is on master.

Tested with these tags: v0.20.0 and v0.20.1 and it seems it works properly.

So I am a bit confused, why some commits are pushed directly to the master branch and other to the dedicated minor release branch.

Is it normal that master Head doesn't work properly until you add a commit that is tagged?