Closed lordk911 closed 2 years ago
LGTM
I followed given steps but end up with below error (Please advice if any solution available)
y4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Exception when registering SparkListener
Caused by: org.apache.atlas.AtlasException: Failed to load application properties at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:155)
Caused by: org.apache.commons.configuration.ConversionException: 'atlas.graph.index.search.map-name' doesn't map to a List object: false, a java.lang.Boolean
you don't need all the config field in the atlas-application.properties for spark , below is enough:
atlas.authentication.method.kerberos=false atlas.client.checkModelInStart=false atlas.cluster.name=hadoop atlas.kafka.bootstrap.servers=workercxx atlas.rest.address=http://master-10-0-xxx atlas.spark.enabled=true
I tried with removing all other fields but then I am getting below exception
Caused by: java.util.NoSuchElementException: 'atlas.graph.index.search.solr.wait-searcher' doesn't map to an existing object at org.apache.commons.configuration.AbstractConfiguration.getBoolean(AbstractConfiguration.java:644) at org.apache.atlas.ApplicationProperties.setDefaults(ApplicationProperties.java:374) at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:146)
Hi @lordk911 , does this support atlas.client.type=kafka
I am getting below error
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.AbstractMethodError: Receiver class com.hortonworks.spark.atlas.KafkaAtlasClient does not define or inherit an implementation of the resolved method 'abstract java.lang.String getMessageSource()' of abstract class org.apache.atlas.hook.AtlasHook.
at org.apache.atlas.hook.AtlasHook.
@sbbagal13 Could you resolve the issue: py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
I'm using Spark3.1 , I want to integration with apache atlas and ranger, to do data governance.
I know there is a project https://github.com/hortonworks-spark/spark-atlas-connector , but it not support spark3.x
finally I make it, what I do I will show bellow:
1、first you need spark-atlas-connector_2.12-XXX.jar , this can download from maven 2、mkdir a dir named sac on spark client server 3、in the dir sac we make in step2 , put some jars and config file: atlas-application.properties atlas-common-2.1.0.jar atlas-intg-2.1.0.jar atlas-notification-2.1.0.jar commons-configuration-1.10.jar kafka-clients-2.0.0.3.1.4.0-315.jar spark-atlas-connector_2.12-3.1.1.3.1.7270.0-253.jar 4、config spark-defaults.conf, add bellow configuration item: spark.driver.extraClassPath /{your dir prefix}/sac/* spark.extraListeners com.hortonworks.spark.atlas.SparkAtlasEventTracker spark.sql.queryExecutionListeners com.hortonworks.spark.atlas.SparkAtlasEventTracker 5、use atlas 2.1.0. That's all. 6、if your atlas version prior to 2.1.0 you need to copy spark_model.json from atlas 2.1.0 and put it to/models/1000-Hadoop
7、also atlas version prior to 2.1.0 may not display spark information on the web-site, replace /server/webapp/atlas/WEB-INF/lib directory with atlas 2.1.0's lib directory.
about data security , I found a apache project kyuubi , it have a spark-security model, doc is here : https://submarine.apache.org/docs/userDocs/submarine-security/spark-security/README/ just follow it. now it not support spark3.2.