Closed woop closed 2 years ago
@woop
Add Push/Ingestion API support
Is "ingestion" == "consuming some streams"?
@woop Also I hope there will be Hive support for offline store
Is "ingestion" == "consuming some streams"?
No. It's simply allowing teams to push events to the online store. We aren't starting consumption jobs, but in theory you could launch those jobs with a custom provider.
@woop Also I hope there will be Hive support for offline store
@YikSanChan the current plan does not include developing Hive support. What if we work together to make it easy to add support for Hive? We can add a simply plugin interface and you can extend it to support Hive.
@woop
We can add a simply plugin interface and you can extend it to support Hive.
That sounds good! Will the Hive support be similar as how Dynamo / Redshift support is added, or not?
Looking forward for Redshift support!
Looking forward to redshift, dynamo , feature view support.
Looking forward for Clickhouse support!
Is there a plan to add AWS as a provider?
Is there a plan to add AWS as a provider?
Yes, development is in progress.
@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.
@YikSanChan @woop Any progress re the Hive support? I'd like to talk/contribute to it as well.
FYI I am not working on Hive support
Thanks @baineng
Hey, Hi @woop,
I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView
All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?
Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?
Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.
I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.
Thanks
Let me know your thoughts @woop
@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:
FeatureTable
, how can we materialize from stream source to online store? In new version, Client
doesn't have start_stream_to_online_ingestion()
anymore on feast >= v0.10 ....Featureview
, is there no way to materialize from online source to online storage?Thanks
Hey, Hi @woop,
I have recently moved from "feast 0.9.3" to "feast 0.11.0" and have few questions after using FeatureViews. Particularly I see few gaps in FeatureView
- All I see is that FeatureView revolves around the FeatureRepo directory hence with the introduction of FeatureView, are we planning to remove dependency on Feast Core, Serving,Postgres and Feast Spark?
Yes we are removing those dependencies, but we are not precluding the use of Spark or having an API centric registry. We just think that the base installation of Feast should be lighter weight. We are building extension points for Feast so that teams can plug in their own storage or compute systems.
- Also I don't see start_stream_to_online_ingestion in FeatureView, which was available in Client. So should I still depend on FeaturTable, and hence Feast 0.9 for online ingestion?
Feast 0.9 has streaming ingestion. We don't have streaming support in 0.10+ yet, since we've removed the Spark dependency. Streaming jobs will be launched through the apply
method.
- Feast 0.9.3 had client.ingest() API where data was created with datetime partitions, which was helpful in faster historical retrieval. But with Feast 0.11, in case of local provider, we need to deal with a single large file, which might not scale up for larger datasets. Particularly if we need on-prem deployment, and if I do not want to entitle to GCP, this would be an issue.
Our File sources are meant for convenience today, and won't scale to production loads. It's just using Pandas under the hood, not Spark. I don't think the key value here was ingest()
, but more the compute layer that did the retrieval, right?
One of our design goals is to double down on storage technologies that offload a lot of the complexity of reading, writing, and transforming data. We can't support all technologies. We'd rather support BigQuery, Redshift, and other data warehouses instead of having to rewrite the same queries as ETL pipelines in Spark. In your case it seems like Hive might be a good idea, but we don't support that today. It should be a pretty straightforward addition though.
I believe, Point 2 and 3, should definitely be addressed as part of roadmap, if we are going with FeatureView.
Thanks
Let me know your thoughts @woop
@rakshithvsk, @woop I also have interested in your Q1, Q2. I'd like to add some points on the Q2:
- If we should depend on the
FeatureTable
, how can we materialize from stream source to online store? In new version,Client
doesn't havestart_stream_to_online_ingestion()
anymore on feast >= v0.10 ....
Streaming jobs can be launched by apply
if you use a custom provider. Other than that you will need to wait for us to add streaming support.
- If I want to use
Featureview
, is there no way to materialize from online source to online storage?
Materialize()
Thanks
Materialize()
Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support
Materialize()
Isn't it for loading data from 'offline' source to 'online' storage? I asked from 'online(streaming)' source to 'online' storage... But according to your first answer, I guess I need to wait for you guys to add streaming support
Ah I see. I was through off by "online source". Yea, there is not solution right now.
Our current proposed roadmap for 0.11 and onward is as follows
Backlog
Scheduled for development (next 3 months)
We're open to feedback. Either new roadmap items or reprioritization!