aws-cloudformation / cloudformation-coverage-roadmap

The AWS CloudFormation Public Coverage Roadmap
Creative Commons Attribution Share Alike 4.0 International
1.11k stars 57 forks source link

AWS::Glue::Crawler Lakeformation configuration #1921

Closed Rizxcviii closed 7 months ago

Rizxcviii commented 10 months ago

Name of the resource


Resource name

No response


AWS Glue Crawlers require that "Lake Formation configuration" be enabled, when the data catalog/data lake bucket is registered in Lakeformation. When the configuration is not enabled, we get errors during crawling with the error User does not have access to target s3://path/to/data. This is a massive hindrance, as we have to manually set this in the UI, when there is currently support via the Crawler SDK API to allow for a separate LakeFormationConfiguration. My specific use case is using the CDK, but given that we are targeting the L1 construct (this CloudFormation API), it makes sense to open a request here rather than the CDK.

The request therefore is to add the LakeFormationConfiguration config into CloudFormation.

Other Details

No response

Rizxcviii commented 7 months ago

For anyone else waiting for this to be added, here is an implementation using the CDK in Typescript. It uses a Custom Resource at the moment.

import * as cdk from "aws-cdk-lib";
import * as glueCfn from "aws-cdk-lib/aws-glue";
import * as iam from "aws-cdk-lib/aws-iam";
import * as cr from "aws-cdk-lib/custom-resources";
import { Construct } from "constructs";

export interface LakeformationConfigurationProps {
   * The AWS account ID that owns the resources.
   * @default - the current account
  readonly accountId?: string;

   * The Glue crawler to configure.
  readonly crawler: glueCfn.CfnCrawler;

   * Whether to use Lakeformation for permissions.
   * @default - AWS will use Lakeformation for permissions
  readonly useLakeformationForPermissions?: boolean;

   * The IAM role that wil be used to configure the Lakeformation permissions. This will be directly used by the custom resource, so therefore it should be assumable by ``.
   * @default - a new role will be created
  readonly role?: iam.IRole;

   * The removal policy to apply to the role.
   * @default - `RemovalPolicy.DESTROY`
  readonly removalPolicy?: cdk.RemovalPolicy;

 * A set of configurations to enable Lakeformation for a Glue Crawler. This will make use of a custom resource to configure the crawler.
export class LakeformationConfiguration extends Construct {
    scope: Construct,
    id: string,
    props: LakeformationConfigurationProps
  ) {
    super(scope, id);

    new cr.AwsCustomResource(this, "Resource", {
      resourceType: "Custom::LakeformationConfiguration",
      onUpdate: {
        service: "Glue",
        action: "updateCrawler",
        parameters: {
          LakeFormationConfiguration: {
            AccountId: props.accountId,
            UseLakeFormationCredentials: props.useLakeformationForPermissions,
        physicalResourceId: cr.PhysicalResourceId.of(
      onDelete: {
        service: "Glue",
        action: "updateCrawler",
        parameters: {
          LakeFormationConfiguration: {
            AccountId: props.accountId,
            UseLakeFormationCredentials: false,
      role: props.role,
      removalPolicy: props.removalPolicy,
joonqkr-amazon commented 7 months ago

Hi @Rizxcviii,

Thanks for bringing this issue up. This property has now been implemented, and you should be able to set the LakeFormationConfiguration property when creating a crawler in Cloudformation.

You can set it like any other crawler property like so:

    Type: AWS::Glue::Crawler
      Role: # crawler role
      Name: # crawler name
      DatabaseName: # database name
          - Path: # s3 bucket path
          UseLakeFormationCredentials: True # or False
          AccountId: # set account ID if cross-account

Thank you for your patience.