aws-cloudformation / cloudformation-coverage-roadmap

The AWS CloudFormation Public Coverage Roadmap
https://aws.amazon.com/cloudformation/
Creative Commons Attribution Share Alike 4.0 International
1.1k stars 53 forks source link

AWS::Glue::Crawler Lakeformation configuration #1921

Closed Rizxcviii closed 2 months ago

Rizxcviii commented 5 months ago

Name of the resource

AWS::Glue::Crawler

Resource name

No response

Description

AWS Glue Crawlers require that "Lake Formation configuration" be enabled, when the data catalog/data lake bucket is registered in Lakeformation. When the configuration is not enabled, we get errors during crawling with the error User does not have access to target s3://path/to/data. This is a massive hindrance, as we have to manually set this in the UI, when there is currently support via the Crawler SDK API to allow for a separate LakeFormationConfiguration. My specific use case is using the CDK, but given that we are targeting the L1 construct (this CloudFormation API), it makes sense to open a request here rather than the CDK.

The request therefore is to add the LakeFormationConfiguration config into CloudFormation.

Other Details

No response

Rizxcviii commented 3 months ago

For anyone else waiting for this to be added, here is an implementation using the CDK in Typescript. It uses a Custom Resource at the moment.

import * as cdk from "aws-cdk-lib";
import * as glueCfn from "aws-cdk-lib/aws-glue";
import * as iam from "aws-cdk-lib/aws-iam";
import * as cr from "aws-cdk-lib/custom-resources";
import { Construct } from "constructs";

export interface LakeformationConfigurationProps {
  /**
   * The AWS account ID that owns the resources.
   *
   * @default - the current account
   */
  readonly accountId?: string;

  /**
   * The Glue crawler to configure.
   */
  readonly crawler: glueCfn.CfnCrawler;

  /**
   * Whether to use Lakeformation for permissions.
   *
   * @default - AWS will use Lakeformation for permissions
   */
  readonly useLakeformationForPermissions?: boolean;

  /**
   * The IAM role that wil be used to configure the Lakeformation permissions. This will be directly used by the custom resource, so therefore it should be assumable by `lambda.amazonaws.com`.
   *
   * @default - a new role will be created
   */
  readonly role?: iam.IRole;

  /**
   * The removal policy to apply to the role.
   *
   * @default - `RemovalPolicy.DESTROY`
   */
  readonly removalPolicy?: cdk.RemovalPolicy;
}

/**
 * A set of configurations to enable Lakeformation for a Glue Crawler. This will make use of a custom resource to configure the crawler.
 */
export class LakeformationConfiguration extends Construct {
  constructor(
    scope: Construct,
    id: string,
    props: LakeformationConfigurationProps
  ) {
    super(scope, id);

    new cr.AwsCustomResource(this, "Resource", {
      resourceType: "Custom::LakeformationConfiguration",
      onUpdate: {
        service: "Glue",
        action: "updateCrawler",
        parameters: {
          Name: props.crawler.name,
          LakeFormationConfiguration: {
            AccountId: props.accountId,
            UseLakeFormationCredentials: props.useLakeformationForPermissions,
          },
        },
        physicalResourceId: cr.PhysicalResourceId.of(
          `${props.crawler.name}-lakeformation-configuration`
        ),
      },
      onDelete: {
        service: "Glue",
        action: "updateCrawler",
        parameters: {
          Name: props.crawler.name,
          LakeFormationConfiguration: {
            AccountId: props.accountId,
            UseLakeFormationCredentials: false,
          },
        },
      },
      role: props.role,
      removalPolicy: props.removalPolicy,
    });
  }
}
joonqkr-amazon commented 2 months ago

Hi @Rizxcviii,

Thanks for bringing this issue up. This property has now been implemented, and you should be able to set the LakeFormationConfiguration property when creating a crawler in Cloudformation.

You can set it like any other crawler property like so:

MyCrawler:
    Type: AWS::Glue::Crawler
    Properties:
      Role: # crawler role
      Name: # crawler name
      DatabaseName: # database name
      Targets:
        S3Targets:
          - Path: # s3 bucket path
      LakeFormationConfiguration:
          UseLakeFormationCredentials: True # or False
          AccountId: # set account ID if cross-account

Thank you for your patience.