GoogleCloudPlatform / bigquery-geo-viz

Visualize Google BigQuery geospatial data using Google Maps Platform APIs
https://bigquerygeoviz.appspot.com/
Apache License 2.0
106 stars 37 forks source link

BigQuery GeoViz query returns fewer results than standard BigQuery query #6

Open davidblumntcgeo opened 5 years ago

davidblumntcgeo commented 5 years ago

This issue may be related to #5

My dataset is an import of the US Census 2018 Coastline shapefile, which has 4258 rows. Each row has a non-null geojson field (all linestrings), from which I created a new geography field using the ST_GEOGFROMGEOJSON() function. My geography field is named g, and all of its values are of type LINESTRING.

In BigQuery GeoViz, when I run the query SELECT g FROM xxxxxxxxx.census_coastline2018;

I get 1496 results.

In standard BigQuery, when I run SELECT COUNT(g) FROM xxxxxxxxx.census_coastline2018;

I get back the 4258 as expected.

Looking at my logs, I see that the actual query that was passed to BigQuery from BigQuery GeoViz is

SELECT ST_AsGeoJson(g) AS g FROM ( SELECT g FROM xxxxxxxxx.census_coastline2018 );

Finally, in standard BigQuery, when I run SELECT COUNT(ST_AsGeoJson(g)) FROM ( SELECT g FROM xxxxxxxxx.census_coastline2018 );

I again get 4258.

In other words, the BigQuery GeoViz is omitting results that standard BigQuery returns.

donmccurdy commented 5 years ago

If you open Chrome's Network Tab before running the query in Geo Viz, could you see how large the response is containing those 1496 results? I.e. how many megabytes?

This viewer has various limitations on how many results it can show at once, and the number on the page is how many are shown (or would be if the map were fully zoomed out), not necessarily how many rows the full result set contains.

At present the limit is determined by how much the BigQuery API is willing to send back in a single page of results. That's something like 10-20 MB I think, which might be 50K-100K points (depending on the number of other columns selected), or a smaller number of lines or polygons depending on their complexity. If you select fewer columns, you may see the number of results in the viewer increase.

Probably two changes are worthwhile here:

davidblumntcgeo commented 5 years ago

@donmccurdy thanks for the reply.

Running the query in Chrome with the Network tab open, I see the following:

'queries?alt=json' transferred 2.5 MB (no other entry is larger than 3 KB)

Along the bottom of the network tab: 2.5 MB transferred 9.8 MB resources

As a point of reference, BQ estimated the total query size to be 35.8 MB

When I ran this test, I opened the network tab just before executing the query (in other words, I'd already loaded the BigQuery Geo Viz page). Please let me know if you'd like me to perform it any differently.

donmccurdy commented 5 years ago

Thanks! If you hover your mouse over that 2.5mb value, does it give separate "transferred over network" vs "resource size" values?

When I ran this test, I opened the network tab just before executing the query (in other words, I'd already loaded the BigQuery Geo Viz page). Please let me know if you'd like me to perform it any differently.

Exactly the way you tested it is perfect 👍


But yeah, I do think this has hit the issue above.

davidblumntcgeo commented 5 years ago

Yes indeed, when I hover over, the 2.5 MB value, I see "2.5 MB transferred over network, resource size: 9.8 MB"

Appreciate the quick diagnosis!