airbnb / chronon

Chronon is a data platform for serving for AI/ML applications.
Apache License 2.0
673 stars 36 forks source link

Add metadata upload to Mussel by team #751

Closed yuli-han closed 2 months ago

yuli-han commented 2 months ago

Summary

We want to upload the key-value pair: team -> list of groupBy entities under the key team to mussel.

2024-04-28 23:12:51 INFO  MetadataStore:280 - Putting metadata for
key: group_bys/cs_ds
conf: ["group_bys/cs_ds/host.trip_stage.v1","group_bys/cs_ds/user.message_intent.v1"]

Why / Goal

Test Plan

Reviewers

@nikhilsimha

nikhilsimha commented 2 months ago

Let's generalize this even more.

we want to eventually put a lot more metadata.

Team/keys -> gbs Team/keys -> joins Team/keys -> features Team -> stagingqueries

Feature -> confs Table -> confs Feature -> GroupBy Feature -> joins groupBy -> joins

For auto complete search: Constant -> list of teams Constant -> list of keys Constant -> list of groupBys Constant -> list of features Constant -> list of joins

I believe we can write a method that extracts all this metadata in one traversal, without having to parse the files repeatedly.

We need a map of endpoint name -> key value extractor

yuli-han commented 2 months ago

Hi @nikhilsimha , I think we have two types of key-value pairs:

  1. keys -> gbConf and keys->joinConf are 1 to 1 mapping and they are already there, we can easily combine them together by providing an upper path which include both joins and groupBys, same for key -> features, feature -> confs, table -> confs, feature -> GroupBy, feature -> joins. These k-v pair can be derived from one single file and they can be finished in one traversal.

  2. team -> gbs, team -> joins and team -> features are 1 to many mapping and we cannot directly derive a put request by looking at one single file. Need to traverse all the files to build the sequence first and then generate the k-v pairs.

I am adding two separate functions now putConfByTeam and putConfByName, We can get all the k-v pair from 1 in function putConfByName by adding new putRequest. For the k-v pairs from 2 we can get all of them in function putConfByTeam. Wander do you think we need to merge these two functions together? I think this is do-able but not sure whether there is a clean way in scala code to do that.