BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using GoogleSQL and taking advantage of flexible pricing models across on-demand and flat-rate options.
There are several ways to ingest data into BigQuery:
Batch load a set of data records.
Stream individual records or batches of records.
Use queries to generate new data and append or overwrite the results to a table.
Use a third-party application or service.
With batch loading, you load the source data into a BigQuery table in a single batch operation. For example, the data source could be a CSV file, an external database, or a set of log files. Traditional extract, transform, and load (ETL) jobs fall into this category.
Get started :
Open the BigQuery page in the Google Cloud console(Go to the BigQuery page).
Once BigQuery opened, in Explorer, select on the ID project and then choose create Dataset.
On you right, a Windows opened to prompt to fullfill, as follows:
In ID project, give a name for ID Dataset.
In Location, choose Region or Multi-region and select the region that you want to store your Dataset.
In Schedule, set the number of days of your table if you activate the expiration of your table.
Next, click on "create Dataset".
Once the Dataset created, find it to click on to create a table as follow :
In source, choose Google Cloud Storage from the options(empty table, Google Cloud Storage, Import, Drive, Google Bigtable, Amazon S3, Azure Blob Storage).
Browse to select the source file and choose the file format.
In the Destination section, select the project ID, the dataset that contains the table, and set the table name.
In the Schema section, select "Automatically detect."
Leave other options as default and click on "Create Table."
Once the table is created, click on it to open, make a query, or create a notebook.
Get started :
Open your IDE, Vs Code is used for this tutorial and then create workspace folder.
Check if you have virtualenv. Else you will set up it.
Set up your virtual environment by using virtualenv venv and then activate it.
Set up the library google cloud bigquery by using as follow on this picture.
Create a python script to ingest data into bigquery, as follow:
Get started :
On your left, click on Data Transfer, then select create a transfer and enable API Data Transfer.
For Type of source, choose the desired source.In this tutorial, choose Google Cloud Storage.
In Transfer Setting Name, enter a name for the transfer.
For Freqency of Data Collection, choose On Demande.
In Destination configuration, choose a Dataset.
In Data Source Details,follow these steps :
Set a Destination table.
For Cloud Storage URI, browser your directory and set the path that contains all files to transfer.
For Write preference, choose one of the optional (APPEND, MIRROR).
For File format, select the appropriate file format.
Set the field delimiter.
Set the Header Rows to skip.
In BigQuery, we can create a programm request and repository.
This tutorial describes the different ways you can transform data in your BigQuery tables.
You can transform data in BigQuery in the following ways:
Use data manipulation language(DML) to transform data in your BigQuery tables
Use Dataform to develop, test, control versions and Schedule SQL workflows in BigQuery.
You can run multiple DML statements concurrently, where BigQuery queues several DML statements that transform your data one after the other. BigQuery manages how concurrent DML statements are run, based upon the transformation type.
For practice with this DML, you can use the scripts in the BigQuery folder of this repository. However, to use them as I do, you'll need to generate a credentials file configured with permissions to edit, update, create, and delete in BigQuery. This do itself as follow:
Open the IAM and Admin an select Service account. Click on create a service account.
In the service account details window:
Set a name for the service account. Provide a description(optional), then click Create and continue.
In Grant this service account access to the project section, add a relevant role and click Continue.
Leave the remaining settings as they are(optional), and click Ok.
In the next window, on the right click to Create a key.
Choose Add a key and then choose Create a new key and Create to generate the key.