airbnb / chronon

Chronon is a data platform for serving for AI/ML applications.
Apache License 2.0
717 stars 44 forks source link

[Chronon]Add new class metadata end point and metadata dir walker #768

Closed yuli-han closed 4 months ago

yuli-han commented 4 months ago

Summary

We are supporting metadata upload to k-v store for key-value pair key->conf right now. We want to add a general class metadata endpoint to support more potential use cases.

This PR is to add two general class MetadataEndPoint and MetadataDirWalker

MetadataEndPoint:

case class MetadataEndPoint[Conf <: TBase[_, _]: Manifest: ClassTag](
    extractFn: (String, Conf) => (String, String),
    name: String
)

Defined with a extract function and an end point name. Extract function extracts the key-value pair from Conf(could be Join/GroupBy/StagingQuery) and file path(string). The name is the dataset name when we send the data to k-v store.

MetadataDirWalker:

class MetadataDirWalker(dirPath: String, metadataEndPointNames: List[String])

Go through the directory to iterate over all the config files and generate k-v pair metadata based on the metadata end points provided.

The PR adds two metadata endpoint ZIPLINE_METADATA and ZIPLINE_METADATA_BY_TEAM

CHRONON_METADATA: key -> conf json in string format e.g : joins/team/team.example_join.v1 -> {...}

CHRONON_METADATA_BY_TEAM: type/team -> list of key in string format e.g : joins/team -> a, b, c

This PR is drafted from the PR Nothing has been changed from metadata uploader job in this PR so there will be no real impact on any active jobs.

Why / Goal

Test Plan

Checklist

Reviewers

@nikhilsimha @better365