Open zwqjoy opened 1 year ago
what is "make_train_data_with_feast.py"? To my knowledge, Feast does not store data itself. It uses third-party storage services as offline and online storage. For your files on hdfs, maybe you can start from here: https://docs.feast.dev/reference/offline-stores/spark
@shuchu If I have many features saved in HDFS, If some one want to merge these features(like 2,3 path feature) to prepare train data. These features very large.
get_historical_features
does not respect the Hive partitioned data and do a full table scan. I saw the query is using "<" operator instead of between. So for a table that has many partitions this could be a bottleneck.
Have you checked it? @zwqjoy
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
if has big offline data(on hdfs), how can I prepare train data use feast?
Can write pyspark file, and submit spark task like below ? spark-submit \ --master yarn \ --queue product \ --deploy-mode cluster \ make_train_data_with_feast.py