feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.59k stars 998 forks source link

Supporting remote and local data source provider (e.g.: Local parquet and BigQuery) #2079

Closed ylokhande82 closed 2 years ago

ylokhande82 commented 2 years ago

Is your feature request related to a problem? Please describe. We are trying to consume two different sources 1 remote e.g.: BigQuery and another one local Parquet file, which is currently not possible?

IMO, there are possible scenarios where multiple data sources needs to be connected to extract the meaningful feature and save it in Feature store. Currently not be able to do that unless we create a custom connector or bring the data to the common data source.

Describe the solution you'd like Supporting multiple data sources (e.g.: BigQuery, Parquest file), making sure it has common attributes to query and connect on will help easy debugging and avoid custom connectg creation.

Here are additional details as a defect: https://github.com/feast-dev/feast/issues/2036

woop commented 2 years ago

I think this issue requires a bit more discussion. Adding support for any two data sources is certainly an incredible amount of work, but it may be possible to support a single offline store with local file sources. We already need to maintain support for uploading entity dataframes into offline stores, so adding support for also uploading local files should not be a very tall order. So the limitations that I have in mind would be

  1. All joins happen in a single offline store (not as part of some external tool like Spark or Dask or locally).
  2. The only external sources that are supported are FileSources (you can't join from Redshift to BigQuery).

Would that work for your use case?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.