In compiler we are validating the derivation logic defined by users, and also print out the feature name list if --feature-display flag is enabled.
There are some bugs in the existing derivation validation and feature print out. This PR is to fix these bugs.
Derivation validation logic:
What can be used as input to derivations (online)
All join_parts features + all external_part value fields
ds + ts (special handling)
What can be used as input to derivations (offline)
All join_parts features + left source keys + all external_part value fields
ds + ts (special handling)
In derivation validation, we are validating the renaming derivations, the original column should exist in pre-derived columns(for online case keys are not included, for offline case keys are included). We also validate there is no duplicate columns name.
Summary
In compiler we are validating the derivation logic defined by users, and also print out the feature name list if --feature-display flag is enabled.
There are some bugs in the existing derivation validation and feature print out. This PR is to fix these bugs.
Derivation validation logic:
What can be used as input to derivations (online) All join_parts features + all external_part value fields ds + ts (special handling)
What can be used as input to derivations (offline) All join_parts features + left source keys + all external_part value fields ds + ts (special handling)
In derivation validation, we are validating the renaming derivations, the original column should exist in pre-derived columns(for online case keys are not included, for offline case keys are included). We also validate there is no duplicate columns name.
Why / Goal
Test Plan
Tested by running the compiler on a config with derivation defined on external column name: Test result for Airbnb users: https://docs.google.com/document/d/12FDawWrC-5QMY70Sx7UguO74QnhZxRZbYJcwlYeiJ2o/edit?usp=sharing
Checklist
Reviewers
@pengyu-hou @hzding621 @donghanz