adam-singer / dart-carte-du-jour

Pub documentation generation system
www.dartdocs.org
Other
25 stars 8 forks source link

Load big query with package_build_info.json data #14

Closed adam-singer closed 10 years ago

adam-singer commented 10 years ago

https://developers.google.com/bigquery/loading-data-into-bigquery

adam-singer commented 10 years ago

Useful for finding all failed packages

adam-singer commented 10 years ago

way to query for all build package info files

gsutil ls gs://www.dartdocs.org/**/package_build_info.json
adam-singer commented 10 years ago

After ingesting all the package build files a simple query can be made with big query

SELECT name, version, isBuilt FROM [test_dummy_data_set.my_table] 
WHERE isBuilt = false LIMIT 1000
adam-singer commented 10 years ago

Simple idea to script getting all the json files into a single blob for big query. Would of been nice if it was possible to script this directly into big query.

f=$(gsutil ls gs://www.dartdocs.org/**/package_build_info.json)
for e in $f; do echo $(gsutil cat $e)>> /tmp/all.json; done

Then import all.json directly into bigquery. Bigquery schema would be name:STRING,version:STRING,isBuilt:BOOLEAN,datetime:TIMESTAMP

Commandline example of loading data into bigquery

bq load --source_format=NEWLINE_DELIMITED_JSON  test_dummy_data_set.my_table /tmp/all.json name:STRING,version:STRING,isBuilt:BOOLEAN,datetime:TIMESTAMP

We can also use bigquery to count the failed builds

bq query "SELECT COUNT(isBuilt) AS failedCount FROM [test_dummy_data_set.my_table] WHERE isBuilt = false"
bq rm -f dart-carte-du-jour:test_dummy_data_set.my_table

Still using this issue as a scratch pad for notes. Ignore these comments

cat fetchData.sh 
rm -rf /tmp/all.json
bq rm -f dart-carte-du-jour:test_dummy_data_set.my_table
F=$(gsutil ls gs://www.dartdocs.org/**/package_build_info.json)
for e in $F; do echo $(gsutil cat $e)>> /tmp/all.json; done
bq mk dart-carte-du-jour:test_dummy_data_set.my_table
bq load --source_format=NEWLINE_DELIMITED_JSON  dart-carte-du-jour:test_dummy_data_set.my_table /tmp/all.json name:STRING,version:STRING,isBuilt:BOOLEAN,datetime:TIMESTAMP
adam-singer commented 10 years ago

superseded by datastore