Closed absingh-coursera closed 1 month ago
@absingh-coursera Hey, I think you might be somewhat confusing the concepts of DataSource
and OfflineStore
here. Offline stores in feast are engines (not storage) by which feast gets the offline datasets, joins them and produces the training set. s3 alone can't be an offline store implementation as you can't do data transformations in s3. There are currently 2 ways you can use s3 to store offline features:
FileSource
to point to s3 folders, but FileSource
data source type can currently be queried by duckdb and dask offline stores only.SparkSource
which is a generic data source for SparkOfflineStore
. as long as you are able to configure spark session to access s3 locations, you can accomplish the same thing with it. If you're trying to use feast in databricks, this is probably the best way to go for you.@tokoko so this would be the entire flow -
SparkOfflineSource
and points it towards a folder containing partioned parquet files.This seems pretty clear, couple of questions -
I built feature view with
SparkOfflineSource
and points it towards a folder containing partioned parquet files.
Yes, today you need to use SparkSource
for this. We plan to add FileSource
support to spark offline store as well in the future. It will behave identically, with the only difference being that with FileSource
you will no longer be bound to spark offline store only. You can have some feast processes running with spark on databricks and some other processes elsewhere with duckdb or dask.
So how does feast knows new data has been ingested in offline store ? when I call Materialize if goes to same folder and checks for latest update right ?
When you use materialize
, you are the one who provides lower and upper bound of event_timestamp column to acquire the dataset from offline. In case you use incremental materialization, then feast stores last upper bound in the registry and uses that for the next run. (docs)
since I want to maintain individual folders for individual feature views will it affect feast feature gathering ? while building training set during offline feature retrival.
Not sure I get the question. Each table behind a feature view needs to be in a separate "folder" in s3, of course.
thanks @tokoko this makes it much clear.
you're welcome. I'll go ahead and close this then.
Is your feature request related to a problem? Please describe. Hi Team, I am trying to use feast as an alternative to sagemaker feature store but here are some constraint due t which I am raising this issue
Describe the solution you'd like There are two ideal solution I would like to see
Describe alternatives you've considered
Additional context Most of the the context I have cleared above if there are any questions I am happy to answer.