feast-dev / feast

The Open Source Feature Store for Machine Learning
https://feast.dev
Apache License 2.0
5.46k stars 976 forks source link

Stream feature view ingestion does not process messages of Kafka from the earliest offset. #3683

Open ShenJiahuan opened 1 year ago

ShenJiahuan commented 1 year ago

Expected Behavior

Feast should let Spark Structured Streaming to process messages from the earliest offset of Kafka.

Current Behavior

Feast only lets Spark Structured Streaming to processes messages that are produced after the ingestion procedure starts.

Steps to reproduce

Start the Kafka producer first, and then invoke SparkKafkaProcessor.ingest_stream_feature_view. The first few messages will not be observed in the output.

Specifications

Possible Solution

Set startingOffsets to earliest.

https://github.com/feast-dev/feast/blob/870762ae9b78d00f4ea144a9ad6174b2b2516176/sdk/python/feast/infra/contrib/spark_kafka_processor.py#L86

https://github.com/feast-dev/feast/blob/870762ae9b78d00f4ea144a9ad6174b2b2516176/sdk/python/feast/infra/contrib/spark_kafka_processor.py#L109

shuchu commented 1 year ago

For this one, @felixwang9817 can help :)

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.