Closed ruebot closed 4 years ago
Merging #432 into master will increase coverage by
0.28%
. The diff coverage is96.29%
.
@@ Coverage Diff @@
## master #432 +/- ##
==========================================
+ Coverage 77.70% 77.99% +0.28%
==========================================
Files 41 43 +2
Lines 1534 1554 +20
Branches 282 286 +4
==========================================
+ Hits 1192 1212 +20
Misses 217 217
Partials 125 125
log4j
configuration issue).
GitHub issue(s): #431
What does this Pull Request do?
Add
imagegraph
, andwebgraph
to command line app.How should this be tested?
ImageGraphExtractor
example:bin/spark-submit --master local\[8\] --files /home/nruest/Projects/au/sample-data/log4j.properties --conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/home/nruest/Projects/au/sample-data/log4j.properties' --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor ImageGraphExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/ImageGraphExtractor --df
WebPagesExtractor
example:bin/spark-submit --master local\[8\] --files /home/nruest/Projects/au/sample-data/log4j.properties --conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/home/nruest/Projects/au/sample-data/log4j.properties' --class io.archivesunleashed.app.CommandLineAppRunner /home/nruest/Projects/au/aut/target/aut-0.50.1-SNAPSHOT-fatjar.jar --extractor WebPagesExtractor --input /home/nruest/Projects/au/sample-data/geocities/* --output /home/nruest/Projects/au/sample-data/app-output/WebPagesExtractor --df
Additional Notes:
log4j
config file required. I'll create a ticket for that, and work on a solution separately.DomainGraphExtractor
writes asGEXF
. Anybody remember why this using GEXF? Shouldn’t it be graphml? https://github.com/archivesunleashed/aut/blob/master/src/main/scala/io/archivesunleashed/app/CommandLineApp.scala#L115-L123webgraph
anddomains
because there is a bit of duplication withDomainFrequencyExtractor
andDomainGraphExtractor
, and there could also be some confusion here how the extractors are labeled.