Open G2H opened 1 year ago
Needs a one pager design doc
Here is an alternative solution until dataform has added support for primary and foreign keys.
post_operations { ALTER TABLE ${self()} ADD PRIMARY KEY (field) NOT ENFORCED ; }
I attempt to handle this in a much more complex (but in some respects more reliable) way using these JS functions. This is not a great approach in many respects however so would definitely benefit from support in Dataform itself:
/* Wait for a specified interval in the context of GoogleSQL procedural language (i.e. a SQL script, not an individual query) */
function wait(intervalString) {
return `
BEGIN
DECLARE WAIT STRING;
DECLARE DELAY_TIME DATETIME;
SET WAIT = 'TRUE';
SET DELAY_TIME = DATETIME_ADD(CURRENT_DATETIME, INTERVAL ${intervalString});
WHILE WAIT = 'TRUE' DO
IF (DELAY_TIME < CURRENT_DATETIME) THEN
SET WAIT = 'FALSE';
END IF;
END WHILE;
END;
`;
}
/* Drops all current constraints on a table and sets the constraints specified in constraints. For use in a Dataform pre operation.
keyInForeignTable defaults to "id" if not specified. Composite keys may be specified for keyInThisTable and keyInForeignTable as a comma separated list.
e.g.
post_operations {
${data_functions.setKeyConstraints(ctx, dataform, {
primaryKey: "myidname",
foreignKeys: [
{keyInThisTable: "myotherkeyname", foreignTable: "myothertable", keyInForeignTable: "myotheridname"},
{keyInThisTable: "anotherkeyname", foreignTable: "myotherothertable"},
{keyInThisTable: "firstpartofcompositekey, secondpartofcompositekey", foreignTable: "myothertable", keyInForeignTable: "firstpartofcompositekeyinothertable, secondpartofcompositekeyinothertable"}
]
})}
}
Using this function enables project level table definitions to avoid having to handle their own workarounds for the following issues:
1. Without putting the ALTER TABLE statements in a conditional IF TRUE THEN... END IF; block, although the script would execute without error, BigQuery query compilation unhelpfully returns an error on the line that attempts to add a primary key if a primary key already exists, ignoring the fact that a previous step in the script removes the primary key.
2. BigQuery has a 5 table updates per 10s limit which may be breached if too many constraint removal operations happen too quickly. The function builds in waits to avoid this.
*/
function setKeyConstraints(ctx, dataform, constraints) {
/* Detect incorrectly formatted constraints objects */
Object.keys(constraints).forEach(key => {
if (!['primaryKey', 'foreignKeys'].includes(key)) {
throw new Error(`Invalid parameter passed to setKeyConstraints() inside constraints object: ${key}. Valid top level parameters are primaryKey and foreignKeys.`);
}
});
if (constraints.foreignKeys && constraints.foreignKeys.length > 0) {
constraints.foreignKeys.forEach(foreignKey => Object.keys(foreignKey).forEach(foreignKeyParameter => {
if (!['keyInThisTable', 'foreignTable', 'keyInForeignTable'].includes(foreignKeyParameter)) {
throw new Error(`Invalid foreign key parameter passed to setKeyConstraints() inside constraints object: ${foreignKeyParameter}. Valid key level parameters are keyInThisTable, foreignTable and (optionally) keyInForeignTable.`);
}
}));
}
return `
IF TRUE THEN
/* Delete all key constraints on the table - even ones that are no longer included in dfeAnalyticsDataform() configuration */
ALTER TABLE ${ctx.self()} DROP PRIMARY KEY IF EXISTS;
FOR constraint_to_drop IN (
SELECT
SPLIT(constraint_name, ".")[1]
FROM
${"`" + dataform.projectConfig.defaultDatabase + "." + dataform.projectConfig.defaultSchema + (dataform.projectConfig.schemaSuffix ? "_" + dataform.projectConfig.schemaSuffix : "" ) + ".INFORMATION_SCHEMA.TABLE_CONSTRAINTS`"}
WHERE
constraint_type = "FOREIGN KEY"
AND table_name = "${ctx.name()}"
)
DO
ALTER TABLE ${ctx.self()} DROP CONSTRAINT IF EXISTS constraint_to_drop;
${wait("2 SECOND")}
END FOR;
/* Set primary key */
${constraints.primaryKey ?
`ALTER TABLE ${ctx.self()} ADD PRIMARY KEY(${constraints.primaryKey}) NOT ENFORCED;
${wait("2 SECOND")}`
: ``}
/* Set foreign key constraints */
${constraints.foreignKeys && constraints.foreignKeys.length > 0 ?
`ALTER TABLE ${ctx.self()}
${constraints.foreignKeys.map(foreignKey =>
`ADD CONSTRAINT ${foreignKey.keyInThisTable.replaceAll(",", "_").replaceAll(" ", "")}_relationship FOREIGN KEY(${foreignKey.keyInThisTable}) REFERENCES ${ctx.ref(foreignKey.foreignTable)}(${foreignKey.keyInForeignTable || "id"}) NOT ENFORCED`).join(',\n')
};`
: ``}
END IF;
`;
}
module.exports = {
wait,
setKeyConstraints
};
See https://cloud.google.com/blog/products/data-analytics/join-optimizations-with-bigquery-primary-and-foreign-keys