Closed Rizxcviii closed 7 months ago
For anyone else waiting for this to be added, here is an implementation using the CDK in Typescript. It uses a Custom Resource at the moment.
import * as cdk from "aws-cdk-lib";
import * as glueCfn from "aws-cdk-lib/aws-glue";
import * as iam from "aws-cdk-lib/aws-iam";
import * as cr from "aws-cdk-lib/custom-resources";
import { Construct } from "constructs";
export interface LakeformationConfigurationProps {
/**
* The AWS account ID that owns the resources.
*
* @default - the current account
*/
readonly accountId?: string;
/**
* The Glue crawler to configure.
*/
readonly crawler: glueCfn.CfnCrawler;
/**
* Whether to use Lakeformation for permissions.
*
* @default - AWS will use Lakeformation for permissions
*/
readonly useLakeformationForPermissions?: boolean;
/**
* The IAM role that wil be used to configure the Lakeformation permissions. This will be directly used by the custom resource, so therefore it should be assumable by `lambda.amazonaws.com`.
*
* @default - a new role will be created
*/
readonly role?: iam.IRole;
/**
* The removal policy to apply to the role.
*
* @default - `RemovalPolicy.DESTROY`
*/
readonly removalPolicy?: cdk.RemovalPolicy;
}
/**
* A set of configurations to enable Lakeformation for a Glue Crawler. This will make use of a custom resource to configure the crawler.
*/
export class LakeformationConfiguration extends Construct {
constructor(
scope: Construct,
id: string,
props: LakeformationConfigurationProps
) {
super(scope, id);
new cr.AwsCustomResource(this, "Resource", {
resourceType: "Custom::LakeformationConfiguration",
onUpdate: {
service: "Glue",
action: "updateCrawler",
parameters: {
Name: props.crawler.name,
LakeFormationConfiguration: {
AccountId: props.accountId,
UseLakeFormationCredentials: props.useLakeformationForPermissions,
},
},
physicalResourceId: cr.PhysicalResourceId.of(
`${props.crawler.name}-lakeformation-configuration`
),
},
onDelete: {
service: "Glue",
action: "updateCrawler",
parameters: {
Name: props.crawler.name,
LakeFormationConfiguration: {
AccountId: props.accountId,
UseLakeFormationCredentials: false,
},
},
},
role: props.role,
removalPolicy: props.removalPolicy,
});
}
}
Hi @Rizxcviii,
Thanks for bringing this issue up. This property has now been implemented, and you should be able to set the LakeFormationConfiguration
property when creating a crawler in Cloudformation.
You can set it like any other crawler property like so:
MyCrawler:
Type: AWS::Glue::Crawler
Properties:
Role: # crawler role
Name: # crawler name
DatabaseName: # database name
Targets:
S3Targets:
- Path: # s3 bucket path
LakeFormationConfiguration:
UseLakeFormationCredentials: True # or False
AccountId: # set account ID if cross-account
Thank you for your patience.
Name of the resource
AWS::Glue::Crawler
Resource name
No response
Description
AWS Glue Crawlers require that "Lake Formation configuration" be enabled, when the data catalog/data lake bucket is registered in Lakeformation. When the configuration is not enabled, we get errors during crawling with the error
User does not have access to target s3://path/to/data
. This is a massive hindrance, as we have to manually set this in the UI, when there is currently support via the Crawler SDK API to allow for a separateLakeFormationConfiguration
. My specific use case is using the CDK, but given that we are targeting the L1 construct (this CloudFormation API), it makes sense to open a request here rather than the CDK.The request therefore is to add the
LakeFormationConfiguration
config into CloudFormation.Other Details
No response