Closed ebarault closed 2 years ago
hi @nfx, did you have time to check this? how should I do these queries in Databricks with terraform then ?
grant usage on catalog hive_metastore to hive_metastore_users;
grant create on catalog hive_metastore to hive_metastore_users;
grant modify on catalog hive_metastore to hive_metastore_users;
grant select on catalog hive_metastore to hive_metastore_users;
@ebarault I'm on vacation till April 12 ;) no release until that week. Only merges of PRs.
And is this error happening on "default technical cluster" created by terraform (eg no cluster id specified in resource) or on unity catalog enabled cluster?
Because Table ACL syntax doesn't expect a catalog name in SQL statements. Do some debug logging together with your solutions architect and file a support ticket.
have nice vacations @nfx this error happens on both clusters type, i first let the resource run its own default cluster, and then tried with different set of clusters (runtimes, instance profiles, etc.)
I filed a support ticket and referenced this github issue within.
@ebarault Thank you.
Theoretically what might work is creating a cluster with pre-UC runtime, like 9.1LTS and specifying it as cluster_id attribute on the resource, so that you know for sure it runs on TACL HC cluster. You should also be able to get SQL command sequence from debug logs, when collaborating with support.
Keep in mind that TACL and UC security models are different and you may need to use different grants.
just tested, with a 9.1LTS runtime, got the same error. here are the cluster logs
22/03/31 17:16:58 ERROR DatabricksS3LoggingUtils$:V3: S3 request failed with com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request; request: HEAD https://***ROOT_DBFS_BUCKET***.s3.amazonaws.com {} Hadoop 2.7.4, aws-sdk-java/1.11.655 Linux/5.4.0-1063-aws OpenJDK_64-Bit_Server_VM/25.302-b08 java/1.8.0_302 scala/2.12.10 vendor/Azul_Systems,_Inc. com.amazonaws.services.s3.model.HeadBucketRequest; Request ID: X9EQS0NM9AB6DXT8, Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=, Cloud Provider: AWS, Instance ID: i-09a015c66eb6e7c69 (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: X9EQS0NM9AB6DXT8; S3 Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=), S3 Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=; Request ID: null, Extended Request ID: null, Cloud Provider: AWS, Instance ID: i-09a015c66eb6e7c69
com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request; request: HEAD https://***ROOT_DBFS_BUCKET***.s3.amazonaws.com {} Hadoop 2.7.4, aws-sdk-java/1.11.655 Linux/5.4.0-1063-aws OpenJDK_64-Bit_Server_VM/25.302-b08 java/1.8.0_302 scala/2.12.10 vendor/Azul_Systems,_Inc. com.amazonaws.services.s3.model.HeadBucketRequest; Request ID: X9EQS0NM9AB6DXT8, Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=, Cloud Provider: AWS, Instance ID: i-09a015c66eb6e7c69 (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: X9EQS0NM9AB6DXT8; S3 Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=), S3 Extended Request ID: RFBXf6HbswRAISm48hlRd2npYdkF6c6Tw2jmGO/7brQqkcc1GX5+aPUDPIL4D0foy5NCQc4nhpA=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:5706)
at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:5679)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4910)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4866)
at com.amazonaws.services.s3.AmazonS3Client.getBucketLocation(AmazonS3Client.java:1000)
at shaded.databricks.org.apache.hadoop.fs.s3a.EnforcingDatabricksS3Client.getBucketLocation(EnforcingDatabricksS3Client.scala:194)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:675)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:116)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:272)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:331)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:268)
at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:243)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:672)
at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:513)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2Factory.createFileSystem(DatabricksFileSystemV2Factory.scala:55)
at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.$anonfun$resolve$1(MountEntryResolver.scala:67)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:484)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:504)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258)
at com.databricks.common.util.locks.LoggedLock$.withAttributionContext(LoggedLock.scala:73)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297)
at com.databricks.common.util.locks.LoggedLock$.withAttributionTags(LoggedLock.scala:73)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:479)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:404)
at com.databricks.common.util.locks.LoggedLock$.recordOperationWithResultTags(LoggedLock.scala:73)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:367)
at com.databricks.common.util.locks.LoggedLock$.recordOperation(LoggedLock.scala:73)
at com.databricks.common.util.locks.LoggedLock$.withLock(LoggedLock.scala:120)
at com.databricks.common.util.locks.PerKeyLock.withLock(PerKeyLock.scala:36)
at com.databricks.backend.daemon.data.filesystem.MountEntryResolver.resolve(MountEntryResolver.scala:64)
at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$initialize$1(DatabricksFileSystemV2.scala:75)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:484)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:504)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:479)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:404)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperationWithResultTags(DatabricksFileSystemV2.scala:510)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:367)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:510)
at com.databricks.backend.daemon.data.client.DBFSV2.initialize(DatabricksFileSystemV2.scala:63)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.initialize(DatabricksFileSystem.scala:230)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:357)
at com.databricks.backend.daemon.driver.DatabricksILoop$.initializeSharedDriverContext(DatabricksILoop.scala:386)
at com.databricks.backend.daemon.driver.DatabricksILoop$.getOrCreateSharedDriverContext(DatabricksILoop.scala:277)
at com.databricks.backend.daemon.driver.DriverCorral.driverContext(DriverCorral.scala:229)
at com.databricks.backend.daemon.driver.DriverCorral.<init>(DriverCorral.scala:102)
at com.databricks.backend.daemon.driver.DriverDaemon.<init>(DriverDaemon.scala:50)
at com.databricks.backend.daemon.driver.DriverDaemon$.create(DriverDaemon.scala:287)
at com.databricks.backend.daemon.driver.DriverDaemon$.wrappedMain(DriverDaemon.scala:362)
at com.databricks.DatabricksMain.$anonfun$main$1(DatabricksMain.scala:117)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.DatabricksMain.$anonfun$withStartupProfilingData$1(DatabricksMain.scala:425)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:484)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:504)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258)
at com.databricks.DatabricksMain.withAttributionContext(DatabricksMain.scala:85)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297)
at com.databricks.DatabricksMain.withAttributionTags(DatabricksMain.scala:85)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:479)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:404)
at com.databricks.DatabricksMain.recordOperationWithResultTags(DatabricksMain.scala:85)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:395)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:367)
at com.databricks.DatabricksMain.recordOperation(DatabricksMain.scala:85)
at com.databricks.DatabricksMain.withStartupProfilingData(DatabricksMain.scala:425)
at com.databricks.DatabricksMain.main(DatabricksMain.scala:116)
at com.databricks.backend.daemon.driver.DriverDaemon.main(DriverDaemon.scala)
22/03/31 17:16:58 INFO deprecation: fs.s3a.server-side-encryption-key is deprecated. Instead, use fs.s3a.server-side-encryption.key
22/03/31 17:16:58 INFO DBFS: Initialized DBFS with DBFSV2 as the delegate.
22/03/31 17:16:58 INFO HiveConf: Found configuration file file:/databricks/hive/conf/hive-site.xml
22/03/31 17:16:59 INFO SessionManager: HiveServer2: Background operation thread pool size: 100
22/03/31 17:16:59 INFO SessionManager: HiveServer2: Background operation thread wait queue size: 100
22/03/31 17:16:59 INFO SessionManager: HiveServer2: Background operation thread keepalive time: 10 seconds
22/03/31 17:16:59 INFO AbstractService: Service:OperationManager is inited.
22/03/31 17:16:59 INFO AbstractService: Service:SessionManager is inited.
22/03/31 17:16:59 INFO AbstractService: Service: CLIService is inited.
22/03/31 17:16:59 INFO AbstractService: Service:ThriftHttpCLIService is inited.
22/03/31 17:16:59 INFO AbstractService: Service: HiveServer2 is inited.
22/03/31 17:16:59 INFO AbstractService: Service:OperationManager is started.
22/03/31 17:16:59 INFO AbstractService: Service:SessionManager is started.
22/03/31 17:16:59 INFO AbstractService: Service: CLIService is started.
22/03/31 17:16:59 INFO AbstractService: Service:ThriftHttpCLIService is started.
22/03/31 17:16:59 INFO ThriftCLIService: HTTP Server SSL: adding excluded protocols: [SSLv2, SSLv3]
22/03/31 17:16:59 INFO ThriftCLIService: HTTP Server SSL: SslContextFactory.getExcludeProtocols = [SSL, SSLv2, SSLv2Hello, SSLv3]
22/03/31 17:16:59 INFO Server: jetty-9.4.42.v20210604; built: 2021-06-04T17:33:38.939Z; git: 5cd5e6d2375eeab146813b0de9f19eda6ab6e6cb; jvm 1.8.0_302-b08
22/03/31 17:16:59 INFO session: DefaultSessionIdManager workerName=node0
22/03/31 17:16:59 INFO session: No SessionScavenger set, using defaults
22/03/31 17:16:59 INFO session: node0 Scavenging every 600000ms
22/03/31 17:16:59 WARN SecurityHandler: ServletContext@o.e.j.s.ServletContextHandler@65fe1f47{/,null,STARTING} has uncovered http methods for path: /*
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@65fe1f47{/,null,AVAILABLE}
22/03/31 17:16:59 INFO SslContextFactory: x509=X509@7a1878d(1,h=[ireland-prod.workers.prod.ns.databricks.com],a=[],w=[]) for Server@58b30e3e[provider=null,keyStore=file:///databricks/keys/jetty-ssl-driver-keystore.jks,trustStore=null]
22/03/31 17:16:59 INFO AbstractConnector: Started ServerConnector@5e8a678a{SSL, (ssl, http/1.1)}{0.0.0.0:10000}
22/03/31 17:16:59 INFO Server: Started @27368ms
22/03/31 17:16:59 INFO ThriftCLIService: Started ThriftHttpCLIService in https mode on port 10000 path=/cliservice/* with 5...500 worker threads
22/03/31 17:16:59 INFO AbstractService: Service:HiveServer2 is started.
22/03/31 17:16:59 INFO HiveThriftServer2: HiveThriftServer2 started
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@71dfca65{/sqlserver,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1e9f6095{/sqlserver/json,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@750adad8{/sqlserver/session,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@1b08d26f{/sqlserver/session/json,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE)
22/03/31 17:16:59 INFO DriverCorral: Creating the driver context
22/03/31 17:16:59 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5c8e7687{/StreamingQuery,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@6c056020{/StreamingQuery/json,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@4e93d23e{/StreamingQuery/statistics,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@5eadc347{/StreamingQuery/statistics/json,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO ContextHandler: Started o.e.j.s.ServletContextHandler@26caf4b6{/static/sql,null,AVAILABLE,@Spark}
22/03/31 17:16:59 INFO DriverDaemon: Starting driver daemon...
22/03/31 17:16:59 INFO SparkConfUtils$: Customize spark config according to file /tmp/custom-spark.conf
22/03/31 17:16:59 WARN SparkConf: The configuration key 'spark.akka.frameSize' has been deprecated as of Spark 1.6 and may be removed in the future. Please use the new key 'spark.rpc.message.maxSize' instead.
22/03/31 17:16:59 INFO DriverDaemon$: Attempting to run: 'enable iptables restrictions for Python'
22/03/31 17:16:59 INFO DriverDaemon$: Attempting to run: 'set up filesystem permissions for Python'
22/03/31 17:17:06 INFO DriverDaemon$: Not configuring RStudio Daemon due to process isolation being enabled
22/03/31 17:17:06 INFO Server: jetty-9.4.42.v20210604; built: 2021-06-04T17:33:38.939Z; git: 5cd5e6d2375eeab146813b0de9f19eda6ab6e6cb; jvm 1.8.0_302-b08
22/03/31 17:17:06 INFO DriverDaemon$$anon$1: Message out thread ready
22/03/31 17:17:06 INFO AbstractConnector: Started ServerConnector@4197e859{HTTP/1.1, (http/1.1)}{0.0.0.0:6061}
22/03/31 17:17:06 INFO Server: Started @34362ms
22/03/31 17:17:06 INFO DriverDaemon: Driver daemon started.
22/03/31 17:17:07 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead.
22/03/31 17:17:07 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead.
22/03/31 17:17:08 INFO DriverCorral: Loading the root classloader
22/03/31 17:17:08 INFO DriverCorral: Starting sql repl ReplId-7c20a-d5c43-45e2f-e
22/03/31 17:17:08 INFO DriverCorral: Starting sql repl ReplId-48b79-58f73-070eb-d
22/03/31 17:17:08 INFO DriverCorral: Starting sql repl ReplId-2efff-ec031-3676f-a
22/03/31 17:17:08 INFO DriverCorral: Starting sql repl ReplId-1db39-19ccc-c4527
22/03/31 17:17:08 INFO DriverCorral: Starting sql repl ReplId-42cff-3c8df-beb50-0
22/03/31 17:17:08 INFO SQLDriverWrapper: setupRepl:ReplId-2efff-ec031-3676f-a: finished to load
22/03/31 17:17:08 INFO SQLDriverWrapper: setupRepl:ReplId-7c20a-d5c43-45e2f-e: finished to load
22/03/31 17:17:08 INFO SQLDriverWrapper: setupRepl:ReplId-1db39-19ccc-c4527: finished to load
22/03/31 17:17:08 INFO SQLDriverWrapper: setupRepl:ReplId-48b79-58f73-070eb-d: finished to load
22/03/31 17:17:08 INFO SQLDriverWrapper: setupRepl:ReplId-42cff-3c8df-beb50-0: finished to load
22/03/31 17:17:14 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead.
22/03/31 17:17:14 WARN SQLConf: The SQL config 'spark.sql.hive.convertCTAS' has been deprecated in Spark v3.1 and may be removed in the future. Set 'spark.sql.legacy.createHiveTableByDefault' to false instead.
22/03/31 17:17:15 INFO DriverCorral: Starting sql repl ReplId-3a98a-c43a2-88144-c
22/03/31 17:17:15 INFO SQLDriverWrapper: setupRepl:ReplId-3a98a-c43a2-88144-c: finished to load
22/03/31 17:17:17 INFO ProgressReporter$: Added result fetcher for 4222314057149453388_7236108815156252456_d68cc04c-a5ad-46dd-9a3f-f345b141a557
22/03/31 17:17:18 ERROR SQLDriverLocal: Error in SQL query: SHOW GRANT ON CATALOG
org.apache.spark.sql.AnalysisException: For unity catalog, please specify the catalog name explicitly. E.g. SHOW GRANT `your.address@email.com` ON CATALOG main
at com.databricks.sql.managedcatalog.ManagedCatalogErrors$.shouldSpecifyTheCatalogName(ManagedCatalogErrors.scala:68)
at com.databricks.sql.acl.CatalogSecurableIdentifier.toV1orV2Securable(statements.scala:55)
at com.databricks.sql.acl.ResolvePermissionManagement.apply(ResolvePermissionManagement.scala:53)
at com.databricks.sql.acl.ResolvePermissionManagement.apply(ResolvePermissionManagement.scala:34)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$3(RuleExecutor.scala:221)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:221)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:218)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:210)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:210)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:271)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:264)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:191)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:188)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:109)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:188)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:246)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:347)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:245)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:96)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:134)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:180)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:180)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:97)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:103)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:101)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:689)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:684)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37)
at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:526)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:503)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:611)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:603)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:522)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:557)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221)
at java.lang.Thread.run(Thread.java:748)
@ebarault Can you post debug logs from tf provider, not spark driver?
@ebarault
the issue here is because the config for a TACL cluster is different when Unity Catalog is enabled
For workspaces without Unity Catalog, it is
"spark.databricks.acl.dfAclsEnabled": "true"
For workspaces with Unity Catalog, it is "data_security_mode": "LEGACY_TABLE_ACL"
or "data_security_mode": "USER_ISOLATION"
edit: I just tested the below code on a UC workspace and it applied successfully.
resource "databricks_sql_permissions" "permissions" {
catalog = true
cluster_id = databricks_cluster.tacl.cluster_id
privilege_assignments {
principal = "EMEA"
privileges = ["USAGE", "CREATE", "SELECT", "MODIFY"]
}
}
data "databricks_node_type" "smallest" {
local_disk = true
}
data "databricks_spark_version" "latest_lts" {
long_term_support = true
}
resource "databricks_cluster" "tacl" {
cluster_name = "tacl"
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 2
}
spark_conf = {
"spark.databricks.acl.dfAclsEnabled" : "true",
"spark.databricks.repl.allowedLanguages" : "python,sql",
}
}
@nkvuong can you add a documentation on sql_permissions resource for this edge case of UC-enabled customers? :)
hi @nkvuong , I just tested this and I get exactly the same error:
Error: cannot create sql permissions: cannot read current grants: For unity catalog, please specify the catalog name explicitly. E.g. SHOW GRANT `your.address@email.com` ON CATALOG main
@ebarault interesting - could you check a few things for me, with a notebook attached to the tacl cluster just created.
%sql SHOW GRANT ON CATALOG
- I assume this would throw the UC error
%python spark.conf.get("spark.databricks.sql.initial.catalog.name")
- what is the output of this command
I'm doing this right now. in the meantime, there's a gap between what you said earlier :
For workspaces with Unity Catalog, it is "data_security_mode": "LEGACY_TABLE_ACL" or "data_security_mode": "USER_ISOLATION"
and the example you provided:
spark_conf = {
"spark.databricks.acl.dfAclsEnabled" : "true",
}
So I also tested with
"data_security_mode": "LEGACY_TABLE_ACL" or "data_security_mode": "USER_ISOLATION"
but then I get the following error :
cannot create sql permissions: cluster_id: not a High-Concurrency cluster
@nkvuong
rd is the name of our default UC catalog, we set it with a global init script, it is different for each workspace we use
I suspect this config in the databricks_sql_permissions resource:
catalog = true
is not enough
@ebarault so I was wrong about the cluster config, both types of config are still valid, i.e. they both create TACL clusters
The issue here is because of the default catalog name, TACL commands such as SHOW GRANT ON CATALOG
only assumes 2L namespace because there is a single catalog hive_metastore
When UC is enabled on a workspace, TACL commands then use the default catalog name, which is normally set to hive_metastore
. Any other UC catalog set as default will break these TACL commands, and thus breaks databricks_sql_permissions resources
Fixing this directly in the provider will be complex, as we need different paths for UC & non-UC workspaces.
As a workaround, I would suggest the following:
databricks-cli assign-metastore --workspace-id WORKSPACE_ID --metastore-id METASTORE_ID --default-catalog-name DEFAULT_CATALOG_NAME
resource "databricks_cluster" "tacl" {
cluster_name = "tacl"
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 2
}
spark_conf = {
"spark.databricks.acl.dfAclsEnabled" : "true",
"spark.databricks.repl.allowedLanguages" : "python,sql",
"spark.databricks.sql.initial.catalog.name": "hive_metastore"
}
}
@nkvuong, ok, just setting "spark.databricks.sql.initial.catalog.name": "hive_metastore"
in the tacl cluster is enough, I don't need to change my config (global init script vs. default catalog at metastore assignment), but I take note of the suggetion though
Thank you 👍
@nkvuong @nfx I reviewed the changes on the doc. I think it really misses the bits about
resource "databricks_cluster" "tacl" {
cluster_name = "tacl"
...
spark_conf = {
"spark.databricks.acl.dfAclsEnabled" : "true",
"spark.databricks.repl.allowedLanguages" : "python,sql",
"spark.databricks.sql.initial.catalog.name": "hive_metastore"
}
}
@ebarault the official guideline from the product team is to set default_catalog_name
to hive_metastore
to prevent incompatibility issues - hence keeping the workaround with spark_conf
in the cluster in this issue only
ok @nkvuong
for the record, assigning the default workspace as you mentionned does not work
uc assign-metastore --workspace-id xxxxxx --metastore-id yyyyy --default-catalog-name zzzz
(uc
is an alias for databricks --profile <PROFILE_NAME> unity-catalog
as stated in the doc)
when checking the default catalog with
%python spark.conf.get("spark.databricks.sql.initial.catalog.name")
it gives hive_metastore
Configuration
Expected Behavior
Grant "USAGE", "CREATE", "SELECT", "MODIFY" privileges to principal "hive_metastore_users" on workspace's local hive_metastore
Actual Behavior
Steps to Reproduce
Please list the steps required to reproduce the issue, for example:
terraform apply
Terraform and provider versions
databricks provider 0.5.4
Notes
Unity Catalog is activated on this workspace (although what i'm dealing with here is indeed the workspace's own hive metastore)
Cluster logs