issues
search
archivesunleashed
/
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137
stars
33
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
DomainFrequencyExtractor should remove WWW prefix
#456
ruebot
closed
4 years ago
0
Updating Java install instructions for MacOS, resolves #445
#455
ianmilligan1
closed
4 years ago
1
Add option to save to Parquet for app.
#454
ruebot
closed
4 years ago
3
Update PlainTextExtractor to output a single column; text.
#453
ruebot
closed
4 years ago
6
Update PlainTextExtractor to just extract text
#452
ruebot
closed
4 years ago
1
Add a number of additional app extractors.
#451
ruebot
closed
4 years ago
5
Remove RDD option in app; DataFrame only now.
#450
ruebot
closed
4 years ago
2
Remove RDD options from app
#449
ruebot
closed
4 years ago
2
Add parquet as an app format option
#448
ruebot
closed
4 years ago
0
Add datathon derivatives to app (binary info, web pages, web graph
#447
ruebot
closed
4 years ago
0
[skip-travis] Add spark-submit option to README; resolves #444.
#446
ruebot
closed
4 years ago
2
Update Java 8 instructions for MacOS
#445
ianmilligan1
closed
4 years ago
7
Add spark-submit to README
#444
ruebot
closed
4 years ago
0
Remove GraphX support; resolves #442.
#443
ruebot
closed
4 years ago
1
Remove GraphXML and ExtractGraphX
#442
ruebot
closed
4 years ago
0
Remove WriteGraph; resolves #439.
#441
ruebot
closed
4 years ago
4
Use Monochromatic Ids instead of hash to produce network identifiers.
#440
greebie
closed
4 years ago
5
CommandLineApp DomainGraphExtractor Uses Different Node IDs than WriteGraph
#439
ianmilligan1
closed
4 years ago
2
Add graphml output to CommandLineApp and DomainGraphExtractor.
#438
ruebot
closed
4 years ago
2
Align RDD and DF output for DomainGraphExtractor.
#437
ruebot
closed
4 years ago
3
DomainGraphExtractor produces different output in RDD vs DF
#436
ruebot
closed
4 years ago
0
Add graphml output to DomainGraphExtractor
#435
ruebot
closed
4 years ago
0
Update log4j configuration to resolve #433.
#434
ruebot
closed
4 years ago
1
Command line app fails because of missing log4j configuration
#433
ruebot
closed
4 years ago
1
Add imagegraph, and webgraph to command line app.
#432
ruebot
closed
4 years ago
2
Add webgraph, imagegraph, webpages, etc. to command line app
#431
ruebot
closed
4 years ago
2
Tweak hasDate to handle Seq.
#430
ruebot
closed
4 years ago
2
Restyle keep/discard filter UDFs in the context of DataFrames
#429
ruebot
closed
4 years ago
2
Encoding management
#428
alxdrdelaporte
closed
4 years ago
11
Restyle UDFs in the context of DataFrames
#427
SinghGursimran
closed
4 years ago
3
Update Spark and Hadoop versions.
#426
ruebot
closed
4 years ago
1
Discussion: Restyle UDFs in the context of DataFrames
#425
lintool
closed
4 years ago
10
update for 'src' column
#424
SinghGursimran
closed
4 years ago
3
[skip travis] Add pre-print link to README.
#423
ruebot
closed
4 years ago
0
Add img alt text to imagegraph(); resolves #420.
#422
ruebot
closed
4 years ago
2
Rename imageLinks to imageGraph; resolves #419
#421
ruebot
closed
4 years ago
3
Add alt text column to imageGraph (imageLinks)
#420
ruebot
closed
4 years ago
2
Rename imageLinks to imageGraph
#419
ruebot
closed
4 years ago
0
UDFs that filter on url should also filter on src
#418
ruebot
closed
4 years ago
5
Need --repositories flag with --packages.
#417
ruebot
closed
4 years ago
1
Clean up test descriptions, addresses #372.
#416
ruebot
closed
4 years ago
2
Remaining Matchbox implementations for Scala
#415
SinghGursimran
closed
4 years ago
1
Add crawl_date to binary DataFrames and imageLinks.
#414
ruebot
closed
4 years ago
2
Add crawl_date to binary DataFrames and imageLinks
#413
ruebot
closed
4 years ago
0
Add some PySpark udfs
#412
ruebot
closed
4 years ago
5
Research, test, and benchmark jwarc integration
#411
ruebot
closed
2 years ago
1
Implement Python versions of Serializable APIs
#410
ruebot
closed
4 years ago
4
Implement Python versions of App utilities
#409
ruebot
closed
4 years ago
0
Implement Python versions of Matchbox utilities
#408
ruebot
closed
4 years ago
0
0.18.0 with --packages is broken
#407
ruebot
closed
4 years ago
1
Previous
Next