Closed dwarry closed 2 years ago
Hello @dwarry, this might be a stupid question from me, but did you try to expand SAC-Test
node? Right-click and it shows if I remember correctly.
As I understand it, out of two write
s only the JSON one is captured.
We'll take a look at this. Thanks.
@wajda Yep, precisely that. Thanks.
My testing has been on an out-of-the-box install of HDP3.1.4 (kerberized) - so Spark 2.3.2, Hive 3.1.0.
Please let me know if there's anything else you need.
Ok, so it all boils down to supporting Data Source V2 We'll do it eventually, but for us it's not a priority at the moment,
todo: test it when AbsaOSS/spline#600 is implemented
Update: it doesn't seem to be possible to add support for DataSourceV2
in general. So HWC
capturing should be solved specifically for this type of usage.
I still have yet to find a normal non-fat Maven artifact providing com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter
to be included in the integration-tests module.
So far I was only able to find it in hive-warehouse-connector.jar which is a fat jar and isn't suitable to be used as a dependency.
@dwarry , Sorry for the long delay. Is this issue still valid?
The Hortonworks Data Platform is already EOL software: https://endoflife.software/applications/big-data/hortonworks-data-platform-hdp
And new versions are not coming since the company was bought by Cloudera.
Ok, So I'm closing the issue as "won't fix". If there is any other or similar issue found that is related to the up-to-date software don't hesitate to let us know.
Background
The old HiveContext etc. classes have been deprecated since Spark 2.0, in favour of the new Hive Warehouse Connector, which seems to be necessary to interact with LLAP.
Question
Should Spline be capturing lineage operations performed through the HWC? At the moment it doesn't seem to.
I put together a minimal example which has a PySpark job that just reads from a csv file into a dataframe and then saves that as a json file, and into a Hive table. The Spline lineage only shows the data being written to the json file.
which is launched by
The lineage it captures is