Closed madmacc closed 4 years ago
I would guess the data has to be in this format to get it into BiqQuery? But I can't confirm. There is a way to get a more usable output though, using the json functions in a BiqQuery you can extract into a more digestible format for your BI tool:
e.g. SELECT JSON_EXTRACT_SCALAR(data, '$.yourfirebasedocumentfield') yourfirebasedocumentfield FROM 'yourfirebaseproject.datasetId.yourfirebasecollection_raw_changelog';
https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions
We are working on documenting a solution that transforms Firestore schemas into SQL queries like the ones posted by @JonesM87. These queries will leverage bigquery's json parsing functions to generate richly-typed views your raw collection data in BigQuery.
The source of this script that lives here, but you can also install it with npm (or npx) https://www.npmjs.com/package/@firebaseextensions/fs-bq-schema-views.
As I mentioned earlier, we're still working on documentation for this, but you can try it out with something like:
npx @firebaseextensions/fs-bq-schema-views \
--non-interactive \
--project <your-gcp-project> \
--dataset <bigquery-dataset-with-raw-changelog> \
--table-name-prefix <view-prefix> \
--schema-files <path-to-firestore-schema>
Here, <path-to-firstore-schema>
would be a path to a file on your local filesystem containing a Firestore schema with the fields you'd like to be present in the resulting view. A contrived example of such a schema is:
{
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "favorite_numbers",
"type": "array"
},
{
"name": "last_login",
"type": "timestamp"
},
{
"name": "last_location",
"type": "geopoint"
},
{
"fields": [
{
"name": "name",
"type": "string"
}
],
"name": "friends",
"type": "map"
}
]
}
Each name
should correspond to some data that appears in your Firestore document, and each type
is a valid Firestore type. After running the schema-view script, you'll see a view in the same dataset called <table-name-prefix>_<schema-file-name>
. This view has a column for each named field you specified in your schema, and the contents of each column are the typed values for those fields that appear in your mirrored Firestore documents. One important caveat is that if you specify a field that doesn't appear in a document, the resulting cell in the view will be NULL
.
In any case, once you have this view, you can query it with SQL. We are still working on documenting this tool comprehensively, but feel free to try it out!
Thanks @JonesM87 and thanks @IanWyszynski for the very detailed response. I will give it try with the suggested methods.
@IanWyszynski I am having the following error when trying to run the script: Access Denied: Project myprojectname: User does not have bigquery.jobs.create permission in project myprojectname
I am running the script as follows: npx @firebaseextensions/fs-bq-schema-views --non-interactive --project "myprojectname" --dataset "datasetname" --table-name-prefix "members" --schema-files "bigquery-schema-members.json"
I have tried adding BigQuery.Admin and BigQuery.JobUser permissions to the following accounts: owner account, BigQuery Connection Service Agent, App Engine default service account
Apologies if it is obvious as I am relatively new to this.
@IanWyszynski Thank you for this script !
It work like a charm :)
@madmacc To deploy it, I installed the gcloud sdk on my mac and i used my admin account.
$ gcloud config set account my@email.com
$ gcloud auth application-default login
@rphlmr Thanks for that. I tried:
$ gcloud config set account my@email.com
$ gcloud auth application-default login
Warning here: The environment variable [GOOGLE_APPLICATION_CREDENTIALS] is set to: [R:\myproject\service-account.json] Credentials will still be generated to the default location: [C:\Users\username\AppData\Roaming\gcloud\application_default_credentia ls.json] To use these credentials, unset this environment variable before running your application.
$ set GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\AppData\Roaming\gcloud\application_default_credentials.json"
$ gcloud auth list
Here it returns my admin/owner account email.
I still get the same issue. I am pretty sure I was using my admin/owner account previously just the credentials json file was in a different location.
@madmacc hum you are on Windows, maybe there is a different behaviour. When you set new env variable, you have to quit your console and open a new one (to reload the env).
In this case the warning said to unset the env var if you want to use your account.
I'll do :
set GOOGLE_APPLICATION_CREDENTIALS=
close my terminal and reopen.
verify that the env var is clear with
echo %GOOGLE_APPLICATION_CREDENTIALS%
(should print " %GOOGLE_APPLICATION_CREDENTIALS%")
and retry
$ gcloud config set account my@email.com
$ gcloud auth application-default login
When you do that, you didn't use service account-js. A browser should open and ask for google signin.
I'm not 100% sure of that :s
Appreciate your help @rphlmr. I have it working now.
I used the below to reset it and then I ran it interactively this time.
export GOOGLE_APPLICATION_CREDENTIALS=
The only thing is it has quotes in the string fields otherwise looking all good.
@madmacc Yes quotes are the expected behaviour (I think). It's due to the sql query that build this view.
I changed the query on my project : Select your view details tab ==> edit this request
Replace "JSON_EXTRACT" by "JSON_EXTRACT_SCALAR".
NB : I did this replacement only on columns that are not typed by bigquery
exemple :
SELECT
....
`my_project.all_transactions_export.firestoreTimestamp`(JSON_EXTRACT(data, '$._createdAt')) AS _createdAt,
JSON_EXTRACT_SCALAR(data, '$.transactionId') AS transactionId
....
Then, save your view (you can override the existing view).
@rphlmr Can I ask if you have managed to run the script to import historical data successfully?
npx @firebaseextensions/fs-bq-import-collection
I am having issues with that: https://github.com/firebase/extensions/issues/80
@madmacc
I just run npx @firebaseextensions/fs-bq-import-collection
and answer questions it asked. (I have to said that I retry 3 times to find what to answer ^^).
After that, if your bigquery export cloud function is already in production, you'll have to deal with duplicated datas ;)
If someone is interested, to drop duplicated datas I create an other view :
SELECT
_createdAt as createdAt,
transactionId,
sellerName,
... etc ...
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY document_name) row_number
FROM
`my_project.data_set.my_collection_exported_schema_changelog`)
WHERE
row_number = 1
@rphlmr Ok, well useful to hear it works. Maybe it is just an issue with my environment or a Windows issue. I will set up an old Mac I have and give it a try on that.
To import this is what I answered :
➜ npx @firebaseextensions/fs-bq-import-collection
? What is your Firebase project ID? my_project_id
? What is the path of the the Cloud Firestore Collection you would like to import from? (This may, or may not, be the same Collection for which you plan to mirror changes.) my_collection_name
? What is the ID of the BigQuery dataset that you would like to use? (A dataset will be created if it doesn't already exist) firestore_export
? What is the identifying prefix of the BigQuery table that you would like to import to? (A table will be created if one doesn't already exist) my_collection_name
? How many documents should the import stream into BigQuery at once? 300
Importing data from Cloud Firestore Collection: my_collection_name, to BigQuery Dataset: firestore_export, Table: my_collection_name_raw_changelog
BigQuery dataset already exists: firestore_export
BigQuery table with name my_collection_name_raw_changelog already exists in dataset firestore_export!
View with id my_collection_name_raw_latest already exists in dataset firestore_export.
Inserting 300 row(s) of data into BigQuery
Inserted 300 row(s) of data into BigQuery
Inserting 300 row(s) of data into BigQuery
Inserted 300 row(s) of data into BigQuery
Inserting 300 row(s) of data into BigQuery
Inserted 300 row(s) of data into BigQuery
Inserting 210 row(s) of data into BigQuery
Inserted 210 row(s) of data into BigQuery
---------------------------------------------------------
Finished importing 1110 Firestore rows to BigQuery
---------------------------------------------------------
Is it possible to import from multiple Collections at once? Or do I have to run this for all my Collections? @rphlmr
@colinrw I don't know, but we do : 1 collection = 1 bigquery extension. We think that it's more clear for us :)
@rphlmr but what if I need to get data from multiple collections? Do I have to make multiple extensions for each collection?
@colinrw yes, when you configure a bigquery extension, it allows only one collection at time. Under the hood, a bigquery extension is just a cloud function that responds to the trigger "onWrite" on your collection.
@fredzqm and @IanWyszynski can you comment on the feasibility of this?
Is it possible to import from multiple Collections at once? Or do I have to run this for all my Collections?
Yes, just create two extension writing to the same DATASET_ID and TABLE_ID.
It will just work. (The nice part about schema-less export format.)
Thanks for confirming! I'm leaving this issue open to track documenting the script @IanWyszynski outlined in https://github.com/firebase/extensions/issues/113#issuecomment-555679921. When you get a chance, it would be great if you can make a PR to add it.
Cool script, works like a charm (I'm brand new at this).
I have a pretty complex object, and was wondering if the type array
, supports objects (records)?
Here's the gist of it:
export class Order {
/// The order items. Also called `order lines` in some systems.
orderItems: OrderItem[];
/// A list of payments since multiple payments can be made per order.
payments: Payment[];
}
export class OrderItem {
/// The name from the product.
name: string;
/// The quantity of the product
quantity: number;
}
Is it possible to query a single OrderItem
on a given Order
, by name
, f.ex?
Hi @larssn In your case i would probably have a collection for Orders, OderItems ans Payments. In orders i will store orderItem ids and payment ids (data normalization). It's more easy to make queries, manage state in front end etc.
With this example, in bigquery, you will have a json "stringifyed" in fields 'orderItems' and 'payments'. Bigquery is just normal sql wich knows primitives types.
@rphlmr I figured out a bit. I modified the array function to be the following:
CREATE OR REPLACE FUNCTION `staging_database.firestore_export.firestoreArray`(json STRING) RETURNS ARRAY<STRUCT<name STRING, quantity NUMERIC>> LANGUAGE js AS """
return json ? JSON.parse(json).map(x => ({name: x.name, quantity: x.quantity})) : [];
""";
That seems to allow me to query order items by name or quantity.
The query looks fairly standard:
SELECT
*
FROM
(
SELECT
document_name,
operation,
`staging_database.firestore_export.firestoreTimestamp`(JSON_EXTRACT_SCALAR(data, '$.created')) AS created,
`staging_database.firestore_export.firestoreTimestamp`(JSON_EXTRACT_SCALAR(data, '$.modified')) AS modified,
`staging_database.firestore_export.firestoreArray`(JSON_EXTRACT(data, '$.orderItems')) AS orderItems
FROM
`staging_database.firestore_export.orders_raw_latest`
) orders_raw_latest
CROSS JOIN UNNEST(orders_raw_latest.orderItems)
WHERE name = 'Test product 2'
AND quantity = 2
Do you know how all the JSON_EXTRACT
operations perform?
EDIT: This answer seems to indicate that it isn't a problem: https://stackoverflow.com/a/56450196/536485
Nice :p
I don't know about performance of JSON_EXTRACT but that what the script do to produce its view. So it seems fine for me.
The detailed instructions for the script has now been merged in #161 . I'm closing this now. Please reopen if there's any other issues!
[REQUIRED] Step 2: Extension name
firestore-bigquery-export
What feature would you like to see?
Currently all document properties are exported to single string field/column called "data" as a JSON string in the members_raw_changelog or members_raw_latest tables. I am assuming this is by design and not just a bug in my case. This data is not particularly usable, I would need to schedule a job to extract and manipulate it in another table.
If I "Transfer" data from a firestore export/backup file it is imported with each document property into its own field/column and is usable right away. This is what I was expecting with the extension.
So my request is to have each imported document have its properties in their own field/column so they can be queried by SQL.
How would you use it?
Well at the moment the extension data is unusable. Say I want to use some data in a data visualization tool. This is not possible without a lot of manipulation. With the requested change this would work out of the box.
I have attached an image with an example of the data and another image with the way it should look. The 2nd image is from data transferred from a command line export.
I am actually hoping this is a bug and it will be fixed shortly!