Improve multi-var fulfillment with uneven #matches

datacommonsorg / website

Code for the Data Commons website

https://datacommons.org

Apache License 2.0

21 stars 76 forks source link

Improve multi-var fulfillment with uneven #matches #4287

Closed pradh closed 1 month ago

pradh commented 1 month ago

In multi-var fulfillment, when we have pairs of SVs like: ([LHS1], [RHS1, RHS2, RHS3, RHS4]), previously, RHS2-4 go unused. In the case of [poverty vs. literacy rate], the topic on the RHS got pushed down with the recent model changes, and this bug gets triggered.

This PR fills the empties with the first topic/SV on the shorter side, so that every var on the longer side is considered. This may still not be ideal, and the other extreme is to do a full-cross, but that might lead to a lot more candidates.

Testing: notice that this change restores the scatter diff in server/integration_tests/test_data/e2e_india_demo/howdoesliteracyratecomparetopovertyinindia/chart_config.json

pradh commented 1 month ago

Very nice! And looks like there are more charts showing up from golden. Maybe time to cut total number of charts..

Agreed. @chejennifer is that something you can help with (lower pri than Gemma)? There's a thread on it last week... I think in oncall

chejennifer commented 1 month ago

Very nice! And looks like there are more charts showing up from golden. Maybe time to cut total number of charts..

Agreed. @chejennifer is that something you can help with (lower pri than Gemma)? There's a thread on it last week... I think in oncall

Is it the thread about trying out cutting max charts from 15 to 10? I can work on this after helping with gemma eval tool

pradh commented 1 month ago

Very nice! And looks like there are more charts showing up from golden. Maybe time to cut total number of charts..

Agreed. @chejennifer is that something you can help with (lower pri than Gemma)? There's a thread on it last week... I think in oncall

Is it the thread about trying out cutting max charts from 15 to 10? I can work on this after helping with gemma eval tool

Yes, that thread, but my recommendation would be to "count charts right and add no more than N (20?) charts"... because the current 15 is not the #charts, but the #fulfiller calls that added 1 or more charts.