In the "basic" block, typically we return multiple charts (timeline+highlight / map / ranking), with the first chart being the most relevant one for the query (ranking if "highest" was asked, etc). Since toolformer mode relies on the top-most chart, I thought the change to return just the specific chart was merely an optimization.
Turns out that this choice has accuracy impact too -- for instance, in [coal powered electricity generation in US states] (screenshot), if the top matching SV does not have state-level data, but the 2nd SV (a good match too) has state-level data, perhaps we want to use the 2nd one?
Fundamentally, there's a question on whether we should prefer showing a chart more accurate in place for a lower ranked SV, or a chart for a higher ranked SV that is less accurate in place.
Why the change above failed:
Because in _populate_specific we were not checking whether the user did ask for a child-place-type. For instance, [commute time in california] would implicitly have a "County" sub-type, and the top-chart should be for california, but ends up being a map of counties. Fix that problem by checking that the place-type is not the default one.
Additionally, the implicit "place-type" assumption was relevant for the demo topic dc/topic/ProjectedClimateExtremes, which was a special case with multiple variables per ranking chart. This PR requires that topic to specify a child-type in the query (and accordingly updates the demo query).
Finally, rename simple fulfiller to place_vars (slightly more meaningful).
This PR retries the change that was undone in https://github.com/datacommonsorg/website/pull/4361.
Back-story:
In the "basic" block, typically we return multiple charts (timeline+highlight / map / ranking), with the first chart being the most relevant one for the query (ranking if "highest" was asked, etc). Since toolformer mode relies on the top-most chart, I thought the change to return just the specific chart was merely an optimization.
Turns out that this choice has accuracy impact too -- for instance, in [coal powered electricity generation in US states] (screenshot), if the top matching SV does not have state-level data, but the 2nd SV (a good match too) has state-level data, perhaps we want to use the 2nd one?
Fundamentally, there's a question on whether we should prefer showing a chart more accurate in place for a lower ranked SV, or a chart for a higher ranked SV that is less accurate in place.
Why the change above failed:
_populate_specific
we were not checking whether the user did ask for a child-place-type. For instance, [commute time in california] would implicitly have a "County" sub-type, and the top-chart should be for california, but ends up being a map of counties. Fix that problem by checking that the place-type is not the default one.dc/topic/ProjectedClimateExtremes
, which was a special case with multiple variables per ranking chart. This PR requires that topic to specify a child-type in the query (and accordingly updates the demo query).Finally, rename
simple
fulfiller toplace_vars
(slightly more meaningful).