FINRAOS / MegaSparkDiff

A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multiple environments enable the user to generate a diff report as a Java/Scala-friendly DataFrame or as a file for future use. Comes with out of the box SparkFactory and SparkCompare tools.
https://finraos.github.io/MegaSparkDiff/
Apache License 2.0
49 stars 26 forks source link

Fix BlackDuck security issues #49

Open mmlinford opened 5 years ago

mmlinford commented 5 years ago

Right now there's a scary "1/10 (high risk)" reported by BlackDuck for our project. We should really see what we can do to remedy this. It might not be possible for all dependencies, but in those cases we can at least document why we can't resolve it.

aosama commented 5 years ago

I agree, these high risk issues are mostly from SPARK dependencies, a question would be how can we decide to exclude a dependency.

mmlinford commented 5 years ago

Yeah, the dependency tree is pretty big. The Maven dependency plugin has some features with the analyze goal that we can investigate, and I know that the shaded JAR plugin as an option to remove anything it thinks we don't need. I'm not saying we should release ours only as a shaded JAR, but that at least implies there's something smart enough out there that we can start with.

I think the main difficulty will be from testing. Not only should we have very good code coverage in our tests for after the dependencies are removed, we'd kind of have to brainstorm what are some ways that MSD could be called that aren't implied by simply hitting all the lines / branches / whatever.