Open wajda opened 1 year ago
@harishyadavdevops this problem is most likely caused by the fact that you are reading your Excel file through Pandas API, which is not directly supported by Spline. If you click on the icon to open the detailed execution plan, and there you should see a 4th terminal node that represents your Excel data, but just isn't recognised as a read command (that's why you don't see it on the high-level lineage overview.
Try read the Excel file using Spark Excel connector instead of Pandas.
df_cmo_master = spark.read.format("com.crealytics.spark.excel")\
.option('header','true')\
.option('inferSchema','true')\
.load(f"{input_filepath}/CMO_ERICA_AIM_SAP_Mapping_Master_Latest.xlsx")\
.select(\
col("IDERICA"),\
col("TargetDays").alias("target_days"),\
col("PrimaryPlatformPlan").alias("plan_platfrom"),\
col("sitename").alias("cmo_site"),\
col("primaryplatform").alias("pes_platform"),\
)\
.distinct()
@wajda i have used above code for reading the xlsx file . and code ran perfectly. but when i use this i faced the issue with no lineage is redirecting to Spline UI. Emplty screen is populated in UI.
By default Spline agent only reacts on writing data to a persistent storage, i.e. df.write()
, never on df.read()
, df.show()
etc.
You can enable capturing memory-only actions if you want, it could be useful for debugging purposes:
spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true
Hi Alex, Greetings of the day !!
I need to set up the Spline with Secure (HTTPS). I have followed these steps https://absaoss.github.io/spline/0.4.html but it didn't work.
request you to send me some document or links to setp the spline server with HTTPS secure on ubuntu OS.
need this very badly for me.
https://absaoss.github.io/spline/0.4.html
On Mon, Apr 24, 2023 at 7:35 PM Alex Vayda @.***> wrote:
By default Spline agent only reacts on writing data to a persistent storage, i.e. df.write(), never on df.read(), df.show() etc. You can enable capturing memory-only actions if you want, it could be useful for debugging purposes:
spline.plugins.za.co.absa.spline.harvester.plugin.embedded.NonPersistentActionsCapturePlugin.enabled=true
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-1520225453, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDCYG2AXU2J4BUETIBOLXC2CDXANCNFSM6AAAAAAXJNMQHE . You are receiving this because you were mentioned.Message ID: @.***>
Thanks & Regards Pyadindi Harish Yadav Associate Software Engineer - DevOps [image: photo] Contact: +91-8639581806 Email: @.***
Spline
is a web application. HTTPS is managed by the web server, not the application itself.
For example, if you use Tomcat to run Spline, you have to set up tomcat tu sopport HTTPS. https://tomcat.apache.org/tomcat-9.0-doc/ssl-howto.html
i have done secure through AWS load balancer here is the url .
https://xxxxxxxx.xxxxxxxx.com:9443/producer
https://xxxxxxxx.xxxxxxxx.com:9443/consumer
Here , I have passed below values in databricks cluster. but i lineage is not redirecting to spline UI. ----------------------------> can you guide me in this.
spark.spline.lineageDispatcher httpsspark.spline.lineageDispatcher.https.producer.url https://xxxxxxxx.xxxxxxxx.com:9443/producer https://xxxxxxxx.xxxxxxxx.com:9443/producerspark.spline.mode ENABLEDspark.databricks.delta.preview.enabled true
mycode in databricks notebook:
sc._jvm.za.co.absa.spline.harvester.SparkLineageInitializer.enableLineageTracking(spark._jsparkSession)
database = "Superstore" table = "Superstore.dbo.SalesTransaction" user = "hsbfhs" password = "hshhfgsh"
jdbcDF = spark.read.format("jdbc") \ .option("url", f"jdbc:sqlserver://xxxxxxxxx.com:1433 ;databaseName={database};") \ .option("dbtable", table) \ .option("user", user) \ .option("password", password) \ .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ .load() jdbcDF.createOrReplaceTempView("jdbcDF")
sqlserver_ouput = spark.sql(""" select jdbcDF.discount , jdbcDF.profit , jdbcDF.sales , jdbcDF.Quantity , jdbcDF_PM.pid , jdbcDF_PM.subid , jdbcDF_PM.catid from jdbcDF inner join jdbcDF_PM on (jdbcDF.productname == jdbcDF_PM.name)""")
table = "siftdd" user = @.*" password = "****"
sqlserver_ouput.write.mode("append").saveAsTable(table)
On Tue, May 9, 2023 at 8:56 PM Adam Cervenka @.***> wrote:
Spline is a web application. HTTPS is managed by the web server, not the application itself.
For example, if you use Tomcat to run Spline, you have to set up tomcat tu sopport HTTPS. https://tomcat.apache.org/tomcat-9.0-doc/ssl-howto.html
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-1540387218, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDCYJPDGMRXCT7XP2EKTXFJO3PANCNFSM6AAAAAAXJNMQHE . You are receiving this because you were mentioned.Message ID: @.***>
We try to help when possible, but we cannot spend time in meetings doing tech support.
I cannot tell what is wrong from the code you provided, but I put together a troubleshooting guide. You can try to go through it and find the issue yourself. I hope it will help: https://github.com/AbsaOSS/spline/discussions/1225
Another thing: All messages you send to this ticket are public GitHub issues, so be sure not to share any sensitive data here.
I HAVE A JOBS IN AWS GLUE. but when i ran that job it ran successfully in aws glue. but lineage is not populated in spline. Does ""gluecontext"" is not supported by spline ? if so why?? can some one explain ??
It should be supported. But I think the discussion has already deviated far from the original topic.
Please look through this - https://github.com/search?q=repo%3AAbsaOSS%2Fspline-spark-agent+glue&type=issues If it doesn't help, create a separate issue or a discussion. Help us to keep thinks organised. Thank you.
Hi Alex Vayda
I stopped using the databricks for a while and will start using the databricks later feb-2024.
So, i have an question again on spline, can i please get the clarification please.
After building the image and deploying I could see the UI is directly accessible. so, my question is does spline support the user authentication mechanism .
If spline supports the user based authentication mechanism can you please send me the article on this how to enable the user authentication mechanism.
Thankyou in advance looking forward to your reply .
On Fri, Dec 22, 2023 at 11:20 PM Alex Vayda @.***> wrote:
It should be supported. But I think the discussion has already deviated far from the original topic.
Please look though this - https://github.com/search?q=repo%3AAbsaOSS%2Fspline-spark-agent+glue&type=issues If it doesn't help, create a separate issue or a discussion. Help us to keep thinks organised. Thank you.
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-1867937603, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDC2JMU47RSETNHMIAJTYKXB5RAVCNFSM6AAAAAAXJNMQHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXHEZTONRQGM . You are receiving this because you were mentioned.Message ID: @.***>
Thanks & Regards Pyadindi Harish Yadav DevOps Engineer [image: photo] Contact: +91-8639581806 Email: @.***
The short answer is - No, neither UI nor the REST API has any auth mechanism built-in. Likewise there is no notion of "user" in the system - no on-boarding is required to start using it.
The longer answer is the following. The intention for Spline was to create a simple core system that focuses on one thing only - lineage tracking. The authentication layer can be added on top of it, for example by putting a simple proxy in front of the it that would intercept any HTTP calls and perform authentication. This would basically allow to implement all-or-nothing access control style. If you need more granular access control then the things start being more complex and involved. Some simpler authorization use-cases could still be implemented on the proxy level by intercepting not only requests, but also response and filtering the content being returned to the user. But more complex and sophisticated use-cases definitely have to be implemented in the Spline core. It all depends on what exactly your requirements are.
hey i want to build the spline image by won docker file and run.sh and other dependencies file.
i dont want to have ever time to pull your docker image , i f i want to install into new vms.
hence i request you to suggest a way to this, that oils be better if files are shard
On Sat, Dec 23, 2023 at 5:28 PM Alex Vayda @.***> wrote:
The short answer is - No, neither UI nor the REST API has any auth mechanism built-in. Likewise there is no notion of "user" in the system - no on-boarding is required to start using it.
The longer answer is the following. The intention for Spline was to create a simple core system that focuses on one thing only - lineage tracking. The authentication layer can be added on top of it, for example by putting a simple proxy in front of the it that would intercept any HTTP calls and perform authentication. This would basically allow to implement all-or-nothing access control style. If you need more granular access control then the things start being more complex and involved. Some simpler authorization use-cases could still be implemented on the proxy level by intercepting not only requests, but also response and filtering the content being returned to the user. But more complex and sophisticated use-cases definitely have to be implemented in the Spline core. It all depends on what exactly your requirements are.
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-1868278751, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDC4JLNFBMGKE4QJ7P7LYK3BPVAVCNFSM6AAAAAAXJNMQHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGI3TQNZVGE . You are receiving this because you were mentioned.Message ID: @.***>
Thanks & Regards Pyadindi Harish Yadav DevOps Engineer [image: photo] Contact: +91-8639581806 Email: @.***
hi , I am trying to execute the program from aws glue. and the program had got executed. but the lineage is not visible in spline.
Version of spline agent : s3://glue-479930578883-eu-west-2/lib/spark-3.3-spline-agent-bundle_2.12-2.0.0.jar
I am passing this env parameter in glue .
Key = --conf, value = spark.spline.producer.url= http://18.117.242.93:8080/producer --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener)
Can you please let me nnow if you supportin the lonegae capturing of snowflake.
On Tue, Jun 11, 2024 at 8:29 PM Alex Vayda @.***> wrote:
https://github.com/AbsaOSS/spline-getting-started/blob/main/building-docker.md
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-2160985702, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDC7CQTRZXZLXQC4TZF3ZG4GGRAVCNFSM6AAAAABJCXAZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQHE4DKNZQGI . You are receiving this because you were mentioned.Message ID: @.***>
Thanks & Regards Pyadindi Harish Yadav DevOps Engineer [image: photo] Contact: +91-8639581806 Email: @.***
Hi AbsaOSS/spline-spark-agent, Good Day !!
Below i the error i am getting
ERROR Inbox: Ignoring error java.io.NotSerializableException: org.apache.spark.storage.StorageStatus Serialization stack:
On Wed, Aug 7, 2024 at 3:18 PM Harish Yadav Pyadindi < @.***> wrote:
hi , I am trying to execute the program from aws glue. and the program had got executed. but the lineage is not visible in spline.
Version of spline agent : s3://glue-479930578883-eu-west-2/lib/spark-3.3-spline-agent-bundle_2.12-2.0.0.jar
I am passing this env parameter in glue .
Key = --conf, value = spark.spline.producer.url= http://18.117.242.93:8080/producer --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener)
Can you please let me nnow if you supportin the lonegae capturing of snowflake.
On Tue, Jun 11, 2024 at 8:29 PM Alex Vayda @.***> wrote:
https://github.com/AbsaOSS/spline-getting-started/blob/main/building-docker.md
— Reply to this email directly, view it on GitHub https://github.com/AbsaOSS/spline-spark-agent/issues/665#issuecomment-2160985702, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2USDC7CQTRZXZLXQC4TZF3ZG4GGRAVCNFSM6AAAAABJCXAZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQHE4DKNZQGI . You are receiving this because you were mentioned.Message ID: @.***>
--
Thanks & Regards Pyadindi Harish Yadav DevOps Engineer [image: photo] Contact: +91-8639581806 Email: @.***
Thanks & Regards Pyadindi Harish Yadav DevOps Engineer [image: photo] Contact: +91-8639581806 Email: @.***
@harishyadavdevops, first of all, regarding your questions:
NotSerializableException
- why do you think it is related to Spline? It's not obvious from the stack trace you reported.Now, I have to close comments on this issue as it contains too much off topic discussions. Please remember that GitHub issues is not a support chat. Try to keep your posts organised and informative.
Originally posted by @harishyadavdevops in https://github.com/AbsaOSS/spline-spark-agent/issues/262#issuecomment-1519417615