This is a Google App Engine application for streaming JSON into BigQuery, inspired by Luke Cempre's post on AnalyticsPros
When you request the URL /bq-streamer
with the parameter ?bq={'json':'example'}
then it will start a new task to put that JSON into a partitioned table in BigQuery.
The task is activated via a POST request to /bq-task
with the same JSON as passed to /bq-streamer
See the Google App Engine Samples for how to deploy.
ts
as a STRING that is a UNIX timestamp so add that too. Any unset fields won't be seen by default.app.yaml
field env_variables
to your BigQuery details, and your secret code word:Example:
runtime: python27
api_version: 1
threadsafe: yes
handlers:
- url: .*
script: main.app
#[START env]
env_variables:
DATASET_ID: tests
TABLE_ID: realtime
SECRET_SALT: changethistosomethingunique
#[END env]
https://your-app-id.appost.com/bq-streamer?bq={"field_name":"field_value", "field_name2":"field_value2"}
to add the fields to your BigQuery table.For testing you can call in the browser the URL via GET
but for production call via POST
with the body JSON available to the bq
field.
Other examples:
https://your-app-id.appspot.com/bq-streamer?bq={'bar':'blah5','foo':'hi'}
The data won't appear in the BQ table preview quickly but you can query the table via something like SELECT * FROM dataset.tableID
to see the realtime hits seconds after the hit it made. Turn off USE CACHED RESULTS
. It also adds a ts
field with a unix timestamp of when the hit was sent to BigQuery.
View the logs for any errors https://console.cloud.google.com/logs/viewer
app.yaml
pip
.This installs the libraries to a new folder lib
in the app directory. It most likely won't need to add anything.
pip install -t lib -r requirements.txt
Deploy via:
gcloud app deploy --project [YOUR_PROJECT_ID]
Optional flags:
--project
flag to specify an alternate Cloud Platform Console project ID to what you initialized as the default in the gcloud tool. Example: --project [YOUR_PROJECT_ID]
-v [YOUR_VERSION_ID]
https://your-app-id.appost.com
to view your application.For more information on App Engine:
For more information on Python on App Engine:
Included is also a class to query the entire BigQuery table, in production you would want to limit query to greater than a timestamp in ts to avoid it being too large.
Visiting http://your-app-id.appspot.com/bq-get
will get you the BQ table in JSON format - it has no caching enabled so it will also be the freshest results.
To protect privacy, a hash also has to be supplied. This is generated via the secret salt name you should change in setup to something unique for you, and then generate it in python via:
import hashlib
# your query
q = "SELECT * FROM %s.%s"
# the unique secret word in the app.yaml environment vars
salt = "SECRETWORD"
## use this in the API call for parameter `hash`
hashlib.sha224(q+salt).hexdigest()
Or view the expected hash in the error logs when you attempt connection
By default it will return the most recent record of the table - pass limit=X
to get more.
Example:
http://your-app-id.appspot.com/bq-get?limit=1000&hash=63780cbd6c3f6e632b57d9f8f70ea7edcd3c6eb5cbdd1b3183ba28b6
The output is a list of lists like this:
[
["ts", "1483705837.05", null],
["blah4", "1483710114.48", null],
["1", "4", null],
["1", "4", null],
["this_is_json", "ts", null],
["h2i", "1483714626.43", "blah6"],
["y", "e", null],
["blah3", "1483707843.84", null],
["1", "4", null],
["y", "e", null],
["y", "e", null],
["blah3", "1483709017.45", null],
["y", "e", null],
["h2i", "1483714325.09", "blah6"],
["blah3", "1483710007.28", null],
["1", "4", null],
["h23i", "1483716480.93", "bl8h7"],
["hi", "1483710547.94", "blah5"],
["h2i", "1483716200.83", "blah7"]
]
By default the query is:
query = 'SELECT * FROM %s.%s LIMIT %s' % (datasetId, tableId, limit)
You can use your own query by supplying a q
parameter to the URL. It uses Standard SQL, not legacy. Use the %s.%s
in your query that will be filled in with the correct dataset and tableId. The limit URL parameter is ignored when supplying your own SQL.
http://your-app-id.appspot.com/bq-get?hash=XXXXXq=SELECT * FROM %s.%s LIMIT 10
Applications can then use this data for display.
<script>
var bqArray = {};
bqArray["fieldname"] = "{{dataLayer}}";
bqArray["fieldname2"] = "{{dataLayer2}}";
jQuery.post("https://YOUR-PROJECT-ID.appspot.com/bq-streamer", {"bq":JSON.stringify(bqArray)});
</script>