GroupBy with multiple sources is supported and all sources is "unioned" before processing the group by. So the data availability check in analyer only need to verify that the combined date range of the sources need to cover required date ranges. Current logic requires each source covering all date range which is wrong.
For joins with bootstrap, there is no straightforward to identify the required date range without actually running the bootstrap. So for joins with bootstrap we simply don't raise data availability "errors" (but only print them).
Summary
Update analyzer logic:
Why / Goal
GroupBy with multiple sources is supported and all sources is "unioned" before processing the group by. So the data availability check in analyer only need to verify that the combined date range of the sources need to cover required date ranges. Current logic requires each source covering all date range which is wrong.
For joins with bootstrap, there is no straightforward to identify the required date range without actually running the bootstrap. So for joins with bootstrap we simply don't raise data availability "errors" (but only print them).
Test Plan
Ran local JAR on user joins
Checklist
Reviewers
@yuli-han @pengyu-hou @donghanz