feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.48k stars 978 forks source link

Feature request: Data sources without timestamps? #2257

Open shaunc opened 2 years ago

shaunc commented 2 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am researching how to integrate Feast with my model building process. We typically have some data sources that describe objects that have no timestamp. A canonical example is a geographical area. In theory they have changing characteristics, but for the problem scope for which we build a model, they are considered to be immutable, and our data contains no timestamp field for them. However, your documentation says Feast uses a time-series data model to represent data

Describe the solution you'd like A clear and concise description of what you want to happen.

I would like a way to represent data sources which have no timestamp. Alternately, if this is already possible, I suggest making the documentation clearer on this issue.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

I could add dummy timestamps to data files.

Additional context Add any other context or screenshots about the feature request here.

adchia commented 2 years ago

Thanks for filing an issue!

I believe this isn't supported yet, but we have heard requests for this. It should be relatively easy to support though by adding a dummy timestamp automatically on users behalf in generating training data

shaunc commented 2 years ago

Would you accept a pr for this? Haven't checked yet -- but seems plausible that it would be simple for Feast itself to create a fake timestamp -- I see event_timestamp_column is typed Optional[str] = ''. I presume that empty-string means "sniff data for timestamp column." Perhaps if this were explicitly passed as None you could automatically generate timestamp as "oldest possible"?

adchia commented 2 years ago

Yeah of course. PR here would definitely be encouraged!

Slightly trickier might be changing the point in time queries to use the dummy timestamp generated. Let me know if you need help debugging / getting a test environment setup!

shaunc commented 2 years ago

Great! ... Though I'm at least a couple weeks away from actually doing anything -- right now researching and writing an engineering plan. (Know of any integrations with kedro? :))

adchia commented 2 years ago

cc @felixwang9817 and @samuel100 btw who were discussing this too

adchia commented 2 years ago

Don't think anyone's tried to integrate with Kedro either, but would love to see someone give it a stab :)

shaunc commented 2 years ago

[We will be using kedro as "glue" ... we run workflows in argo-workflow, and kedro can build them for us. Obviously Feast will help us out with our feature metadata -- a current project has 4000-some-odd features and ... some are definitely broken! :). Feast could help us keep track of what they are and where they come from; great-expectations perhaps can tell us e.g. if they don't have the right invariant properties for our ML... which we would like to get back into the same postgres database which has the feast metadata. The question is if the ultimate source of authority is python code, and kedro is gluing things together, we need to figure out how to wrap Feast in a kedro-aware way or vice-versa.... Anyway -- I should start a different issue for that I guess once I am more opinionated. :)]

woop commented 2 years ago

[We will be using kedro as "glue" ... we run workflows in argo-workflow, and kedro can build them for us. Obviously Feast will help us out with our feature metadata -- a current project has 4000-some-odd features and ... some are definitely broken! :). Feast could help us keep track of what they are and where they come from; great-expectations perhaps can tell us e.g. if they don't have the right invariant properties for our ML... which we would like to get back into the same postgres database which has the feast metadata. The question is if the ultimate source of authority is python code, and kedro is gluing things together, we need to figure out how to wrap Feast in a kedro-aware way or vice-versa.... Anyway -- I should start a different issue for that I guess once I am more opinionated. :)]

This is great @shaunc. I'd love to get a bit more details about your use case. If we have that then we can spend a bit of time figuring out what an integration would look like. Integrating with upstream data tooling is certainly something we've spoken about a lot before.

shaunc commented 2 years ago

Thanks for the encouragement! ... Give me a few days to think; I'll create a new issue with more details and further thoughts.

adchia commented 2 years ago

Quick ping on this. Any further thoughts on this?

shaunc commented 2 years ago

We still think this is a good idea -- but we decided to focus our efforts on experiment tracking first -- see kedro-dvc. Feature management will still be an issue, so I'm planning on circling back -- probably in the June timeframe. If you want to move forward yourself or if you hear of any other work, I'm all ears, though! :)

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

HaoXuAI commented 1 month ago

Run into same situation as well