AbsaOSS / spline-spark-agent

Spline agent for Apache Spark
https://absaoss.github.io/spline/
Apache License 2.0
175 stars 90 forks source link

Cant track org.apache.hadoop.fs.rename #760

Open vinhnemo opened 8 months ago

vinhnemo commented 8 months ago

Hi Folks,

Anyone has trouble with problems since the Spark Job includes many write and file rename operators (org.apache.hadoop.fs.rename). This situation made the lineage correct. Please help me if you have faced this.

Context:

My case:

write('hdfs://abc/tmp/123');
write('hdfs://xyz/tmp/123');
write('hdfs://asd/tmp/123');
rename('hdfs://abc/tmp/123','hdfs://abc/123');
rename('hdfs://xyz/tmp/123','hdfs://xyz/123');
rename('hdfs://asd/tmp/123','hdfs://asd/123');

My current approach is to implement a mapping job by using Hadoop audit logs(contains org.apache.hadoop.fs.rename``) to correct Spline'swrite/read operators`