Closed ruebot closed 4 years ago
I'll get an associated documentation PR opened up later today.
Merging #451 into master will increase coverage by
2.17%
. The diff coverage is98.58%
.
@@ Coverage Diff @@
## master #451 +/- ##
==========================================
+ Coverage 74.55% 76.72% +2.17%
==========================================
Files 40 49 +9
Lines 1285 1422 +137
Branches 246 264 +18
==========================================
+ Hits 958 1091 +133
- Misses 211 215 +4
Partials 116 116
Documentation PR: https://github.com/archivesunleashed/aut-docs/pull/57
Oh, sorry. That was copypasta on my part.
Heh no worries @ruebot - it was actually good to see robust error messages.
20/04/21 16:28:11 ERROR CommandLineApp: WebGraphInformationExtractor not supported. The following extractors are supported:
20/04/21 16:28:11 ERROR CommandLineApp: PDFInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: TextFilesInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: ImageGraphExtractor
20/04/21 16:28:11 ERROR CommandLineApp: WebPagesExtractor
20/04/21 16:28:11 ERROR CommandLineApp: ImageInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: WordProcessorInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: SpreadsheetInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: VideoInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: WebGraphExtractor
20/04/21 16:28:11 ERROR CommandLineApp: AudioInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: PresentationProgramInformationExtractor
20/04/21 16:28:11 ERROR CommandLineApp: DomainGraphExtractor
20/04/21 16:28:11 ERROR CommandLineApp: DomainFrequencyExtractor
20/04/21 16:28:11 ERROR CommandLineApp: PlainTextExtractor
GitHub issue(s): #447
What does this Pull Request do?
Add a number of additional app extractors.
How should this be tested?
Additional Notes:
WebGraphExtractor
as an additional option, since it is slightly different than thecsv
output ofDomainGraphExtractor
WebPagesExtractor
to produce similar, and more enhanced output thatPlainTextExtractor
. We might want to consider removingPlainTextExtractor
in the futurecsv
output.