feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.61k stars 1k forks source link

Feast Roadmap for 0.11+ #1527

Closed woop closed 2 years ago

woop commented 3 years ago

Our current proposed roadmap for 0.11 and onward is as follows

Backlog

Scheduled for development (next 3 months)

We're open to feedback. Either new roadmap items or reprioritization!

YikSanChan commented 3 years ago

@woop

Add Push/Ingestion API support

Is "ingestion" == "consuming some streams"?

YikSanChan commented 3 years ago

@woop Also I hope there will be Hive support for offline store

woop commented 3 years ago

Is "ingestion" == "consuming some streams"?

No. It's simply allowing teams to push events to the online store. We aren't starting consumption jobs, but in theory you could launch those jobs with a custom provider.

@woop Also I hope there will be Hive support for offline store

@YikSanChan the current plan does not include developing Hive support. What if we work together to make it easy to add support for Hive? We can add a simply plugin interface and you can extend it to support Hive.

YikSanChan commented 3 years ago

@woop

We can add a simply plugin interface and you can extend it to support Hive.

That sounds good! Will the Hive support be similar as how Dynamo / Redshift support is added, or not?

jianshen92 commented 3 years ago

Looking forward for Redshift support!

cloudbow commented 3 years ago

Looking forward to redshift, dynamo , feature view support.

oleg-savko commented 3 years ago

Looking forward for Clickhouse support!

singh-b commented 3 years ago

Is there a plan to add AWS as a provider?

woop commented 3 years ago

Is there a plan to add AWS as a provider?

Yes, development is in progress.

bennfocus commented 3 years ago

@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.

YikSanChan commented 3 years ago

@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.

FYI I am not working on Hive support

bennfocus commented 3 years ago

1686 FYI, I created a new issue for Hive support, will work on this recently.

woop commented 3 years ago

Thanks @baineng

rakshithvsk commented 3 years ago

Hey, Hi @woop,

I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView

  1. All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?

  2. Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?

  3. Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.

I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.

Thanks

Let me know your thoughts @woop

rightx2 commented 3 years ago

@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:

  1. If we should depend on the FeatureTable, how can we materialize from stream source to online store? In new version, Client doesn't have start_stream_to_online_ingestion() anymore on feast >= v0.10 ....
  2. If I want to use Featureview, is there no way to materialize from online source to online storage?

Thanks

woop commented 3 years ago

Hey, Hi @woop,

I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView

  1. All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?

Yes we are removing those dependencies, but we are not precluding the use of Spark or having an API centric registry. We just think that the base installation of Feast should be lighter weight. We are building extension points for Feast so that teams can plug in their own storage or compute systems.

  1. Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?

Feast 0.9 has streaming ingestion. We don't have streaming support in 0.10+ yet, since we've removed the Spark dependency. Streaming jobs will be launched through the apply method.

  1. Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.

Our File sources are meant for convenience today, and won't scale to production loads. It's just using Pandas under the hood, not Spark. I don't think the key value here was ingest(), but more the compute layer that did the retrieval, right?

One of our design goals is to double down on storage technologies that offload a lot of the complexity of reading, writing, and transforming data. We can't support all technologies. We'd rather support BigQuery, Redshift, and other data warehouses instead of having to rewrite the same queries as ETL pipelines in Spark. In your case it seems like Hive might be a good idea, but we don't support that today. It should be a pretty straightforward addition though.

I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.

Thanks

Let me know your thoughts @woop

woop commented 3 years ago

@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:

  1. If we should depend on the FeatureTable, how can we materialize from stream source to online store? In new version, Client doesn't have start_stream_to_online_ingestion() anymore on feast >= v0.10 ....

Streaming jobs can be launched by apply if you use a custom provider. Other than that you will need to wait for us to add streaming support.

  1. If I want to use Featureview, is there no way to materialize from online source to online storage?

Materialize()

Thanks

rightx2 commented 3 years ago

Materialize()

Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support

woop commented 3 years ago

Materialize()

Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support

Ah I see. I was through off by "online source". Yea, there is not solution right now.