airbnb / chronon

Chronon is a data platform for serving for AI/ML applications.
Apache License 2.0
673 stars 36 forks source link

fix: analyzer handle multiple sources and bootstrap #780

Closed hzding621 closed 2 weeks ago

hzding621 commented 2 weeks ago

Summary

Update analyzer logic:

Why / Goal

GroupBy with multiple sources is supported and all sources is "unioned" before processing the group by. So the data availability check in analyer only need to verify that the combined date range of the sources need to cover required date ranges. Current logic requires each source covering all date range which is wrong.

For joins with bootstrap, there is no straightforward to identify the required date range without actually running the bootstrap. So for joins with bootstrap we simply don't raise data availability "errors" (but only print them).

Test Plan

Ran local JAR on user joins

Checklist

Reviewers

@yuli-han @pengyu-hou @donghanz