Open JoshAubrey opened 1 year ago
@theshanbhag Can you please take a look / triage this?
Here's an example of what I have been doing and how I worked around it for the moment if this helps:
# rebuild MongoIds from their component pieces
CREATE TEMP FUNCTION
convertId(input JSON)
RETURNS STRING
LANGUAGE js AS r"""
function getId(mongoId) {
if (mongoId === null) {return null}
const result =
DecimalHexTwosComplement(mongoId?.timestamp) +
DecimalHexTwosComplement(mongoId?.randomValue1) +
DecimalHexTwosComplement(mongoId?.randomValue2) +
DecimalHexTwosComplement(mongoId?.counter);
return result;
}
function DecimalHexTwosComplement(decimal) {
if (decimal >= 0) {
const hexadecimal = decimal.toString(16);
return hexadecimal;
} else {
const hexadecimal = Math.abs(decimal).toString(16);
let output = '';
for (i = 0; i < hexadecimal.length; i++) {
output += (0x0F - parseInt(hexadecimal[i], 16)).toString(16);
}
output = (0x01 + parseInt(output, 16)).toString(16);
return output;
}
}
return getId(input)
""";
# Transactions de-deuplicate data
CREATE OR REPLACE TABLE
`example-data.mongodb.transactions` AS (
SELECT
* EXCEPT(row_num)
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp DESC ) row_num
FROM
`example-data.mongodb.transactions`) t
WHERE
row_num=1 );
# Transactions Parse source_data and create transformed for use in data studio
CREATE OR REPLACE TABLE
`example-data.mongodb.transactions-transformed` AS(
SELECT
timestamp,
id AS id,
SAFE_CAST(JSON_EXTRACT(source_data, "$.amount") AS DECIMAL) AS amount,
SAFE_CAST(JSON_EXTRACT(source_data, "$.fee") AS DECIMAL) AS fee,
SAFE_CAST(JSON_EXTRACT(source_data, "$.status") AS INT) AS status,
SAFE_CAST(JSON_EXTRACT(source_data, "$.type") AS INT) AS type,
SAFE_CAST(JSON_EXTRACT(source_data, "$.data.paymentMethod.type") AS INT) AS paymentType,
convertId(PARSE_JSON(JSON_QUERY(source_data, "$.fid"))) AS accountId,
FROM
`example-data.mongodb.transactions`);
This issue has been marked as stale due to 180 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the issue at any time. Thank you for your contributions.
Bump
Although it is not a definitive solution, I've created a UDF that recursively transforms mongo objectid into strings. Here the code for my UDF. Hope it helps...
function parseOid(obj) {
if (obj instanceof Array) {
for(var i = 0; i < obj.length; i++) {
obj[i] = parseOid(obj[i]);
}
} else {
for(var prop in obj) {
if (prop == "$oid") {
obj = obj.$oid
return obj
}
if (obj[prop] instanceof Object || obj[prop] instanceof Array) {
obj[prop] = parseOid(obj[prop])
}
}
}
return obj
}
function process(inJson) {
var obj = JSON.parse(inJson)
obj = parseOid(obj)
return JSON.stringify(obj)
}
Related Template(s)
MongoDB to BigQuery
What happened?
The Reference Ids (aka Foreign Keys), which are the same format as the primary MongoId / ObjectId / _id, are getting broken down into their individual component pieces.
Input
accountId: 63e131fb63ccf7e561f96920
Output to source_data in BigQuery:
"accountId":{"timestamp":1675702779,"counter":16345376,"randomValue1":6540535,"randomValue2":-6815},"
I figured out a temporary workaround with a GoogleSQL javascript UDF, but handling this in the template library will be much better. Thank you!
Beam Version
Newer than 2.46.0
Relevant log output
No response