feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.4k stars 961 forks source link

feast duplicates sourses #3312

Open i-am-lock opened 1 year ago

i-am-lock commented 1 year ago

I use feast with local storage. my project structure:

data/
data/features.parquet
feature_repo/
feature_repo/entities.py
feature_repo/sources.py
feature_repo/views.py
feature_repo/feature_store.yaml
...

When I start "feast -c feature_repo plan" it throws an feast.errors.DataSourceRepeatNamesException: "Multiple data sources share the same case-insensitive name my_awesome_source."

I store source's definition in the separate file feature_repo/sources.py and import them from that file in file feature_repo/views.py. For example:

from feature_repo import sources, entities

my_awesome_view= FeatureView(
    name="awesome_view",
    entities=[entities.id],
    ttl=dt.timedelta(days=3),
    schema=[
        Field(...),
    ],
    online=True,
    source=sources.my_awesome_source
    tags={},
)

Feast adds this source twice. Firstly it parses file 'sourses.py' and adds from there and secondly from 'views.py'

Expected Behavior

Don't add the same source twice

Current Behavior

feast add source twice

Steps to reproduce

Reconstruct project structure defined before (sourses and views definitions should be in the repo folder, but different files), then start "feast plan".

Specifications

Possible Solution

The problem is in function parse_repo (https://github.com/feast-dev/feast/blob/master/sdk/python/feast/repo_operations.py#L99). This function parses primarily 'sources.py' and gets 'my_awesome_source' from there. Then the function parses 'views.py' and gets 'my_awesome_source' as an attribute of 'my_awesome_view'. That wat it adds the same source twice, and parse function sees them as different objects, because the source from views file is imported. My guess is that the check elif isinstance(obj, BatchFeatureView) and not any( (https://github.com/feast-dev/feast/blob/master/sdk/python/feast/repo_operations.py#L167) is wrong. batch_source == ds works better. Or maybe you need special comparison function.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

srausser commented 1 year ago

i have the same issue, but with feature views. it looks like the issue is the comparison is using is when it should be using == relevant line

when imported into another file, the feast objects get a new object id

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

srausser commented 1 month ago

bump

tokoko commented 1 month ago

I think one problem with simply using == might be that when feast tries to make inferences after getting repo contents, it might update a data source object and leave an identical data source set in some other feature view unchanged as they might be completely different objects.