The optimization assumes (as in the test cases) a multipolygon would be something like an island group, so GTE/LTE predicates added for the bounding box would exclude most data.
This multipolygon covers most of the world, so there was little benefit. A disjunction of bounding box queries from the polygons within the multipolygon would be better, or maybe converting multipolygons in Within queries to a disjunction of polygons prior to the existing optimization.
This download's Hive query took almost 12 hours to complete (183 days CPU), because the complex polygon optimization didn't do anything useful.
The optimization assumes (as in the test cases) a multipolygon would be something like an island group, so GTE/LTE predicates added for the bounding box would exclude most data.
This multipolygon covers most of the world, so there was little benefit. A disjunction of bounding box queries from the polygons within the multipolygon would be better, or maybe converting multipolygons in Within queries to a disjunction of polygons prior to the existing optimization.