airbnb / chronon

Chronon is a data platform for serving for AI/ML applications.
Apache License 2.0
673 stars 36 forks source link

Validator/Compiler logic refinement #777

Closed yuli-han closed 2 weeks ago

yuli-han commented 3 weeks ago

Summary

In compiler we are validating the derivation logic defined by users, and also print out the feature name list if --feature-display flag is enabled.

There are some bugs in the existing derivation validation and feature print out. This PR is to fix these bugs.

Derivation validation logic:

What can be used as input to derivations (online) All join_parts features + all external_part value fields ds + ts (special handling)

What can be used as input to derivations (offline) All join_parts features + left source keys + all external_part value fields ds + ts (special handling)

In derivation validation, we are validating the renaming derivations, the original column should exist in pre-derived columns(for online case keys are not included, for offline case keys are included). We also validate there is no duplicate columns name.

Why / Goal

Test Plan

Tested by running the compiler on a config with derivation defined on external column name: Test result for Airbnb users: https://docs.google.com/document/d/12FDawWrC-5QMY70Sx7UguO74QnhZxRZbYJcwlYeiJ2o/edit?usp=sharing

Checklist

Reviewers

@pengyu-hou @hzding621 @donghanz

hzding621 commented 2 weeks ago

@yuli-han Can we also improve the unit tests?