issues
search
archivesunleashed
/
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137
stars
33
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
`s3a` URLs don't work as in documentation
#556
acruise
opened
6 months ago
1
Update Apache Commons Compress dependency.
#555
ruebot
closed
6 months ago
1
Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4
#554
dependabot[bot]
closed
6 months ago
1
Bump org.apache.tika:tika-core from 1.23 to 1.28.3
#553
dependabot[bot]
closed
6 months ago
1
Bump org.apache.spark:spark-core_2.12 from 3.0.1 to 3.3.3
#552
dependabot[bot]
closed
6 months ago
1
Bump snappy-java from 1.1.7.3 to 1.1.10.1
#551
dependabot[bot]
closed
1 year ago
1
Bump guava from 29.0-jre to 32.0.0-jre
#550
dependabot[bot]
closed
1 year ago
1
Bump spark-core_2.12 from 3.0.1 to 3.4.0
#549
dependabot[bot]
closed
1 year ago
1
Add scalafix and remove unused imports.
#548
ruebot
closed
1 year ago
1
Last modified headers
#547
ruebot
closed
1 year ago
5
Include last modified date for a resource
#546
ruebot
closed
1 year ago
2
Use YYYYMMDD for crawl_date for DomainGraphExtractor.
#545
ruebot
closed
1 year ago
1
DomainGraph should use YYYYMMDD not YYYYMMDDHHMMSS
#544
ruebot
closed
1 year ago
0
Bump jsoup from 1.14.2 to 1.15.3
#543
dependabot[bot]
closed
1 year ago
1
org.apache.tika.mime.MimeTypeException: Invalid media type name: application/rss+xml lang=utf-8
#542
ruebot
closed
2 years ago
1
Add ARCH text files derivatives.
#541
ruebot
closed
2 years ago
1
Add ARCH text files derivatives
#540
ruebot
closed
2 years ago
0
Make webpages() consistent across aut and ARCH.
#539
ruebot
closed
2 years ago
4
Remove http headers, and html on webpages()
#538
ruebot
closed
2 years ago
1
Update README
#537
ruebot
closed
2 years ago
1
Fix codecov GitHub action.
#536
ruebot
closed
2 years ago
1
Bump commons-compress from 1.14 to 1.21
#535
dependabot[bot]
closed
2 years ago
1
Add domain column to webpages()
#534
ruebot
closed
2 years ago
0
Remove Java w/arc processing, and replace it with Sparkling.
#533
ruebot
closed
2 years ago
8
Discard date RDD filter only takes a single string, not a list of strings.
#532
ruebot
closed
2 years ago
0
Bump jackson-databind from 2.10.0 to 2.12.6.1
#531
dependabot[bot]
closed
2 years ago
2
Bump hadoop-common from 2.7.4 to 3.2.3
#530
dependabot[bot]
closed
2 years ago
2
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca)
#529
JakeBickUKGWA
closed
2 years ago
1
Bump hadoop-common from 2.7.4 to 2.10.1
#528
dependabot[bot]
closed
2 years ago
1
Bump xercesImpl from 2.12.0 to 2.12.2
#527
dependabot[bot]
closed
2 years ago
1
Change crawl_date format to YYYYMMDDHHMMSS, update hasDate filter.
#526
ruebot
closed
2 years ago
2
Include timestamp in crawl date
#525
ruebot
closed
2 years ago
0
Replace scala-uri library from ExtractDomain.
#524
ruebot
closed
2 years ago
0
Issue 522
#523
ruebot
closed
2 years ago
0
Scaladocs haven't been created since 0.90.0 release
#522
ruebot
closed
2 years ago
0
Replace scala-uri library from ExtractDomain and just parse public_suffix_list.dat
#521
ruebot
closed
2 years ago
1
Update ExtractDomain to extract apex domains.
#520
ruebot
closed
2 years ago
8
ExtractDomains returns non-Apex Domains
#519
ruebot
closed
2 years ago
2
Bump jsoup from 1.13.1 to 1.14.2
#518
dependabot[bot]
closed
3 years ago
1
Filter or filedesc and dns records from arcs.
#517
ruebot
closed
3 years ago
1
ARC file name appearing in `url` list
#516
ianmilligan1
closed
3 years ago
0
Handle wget WARC-Target-URI formatting.
#515
ruebot
closed
3 years ago
2
WARC-Target-URI in Wget warc files is not parsed properly
#514
javieraespinosa
closed
3 years ago
1
Add missing crawl_date column to binary information jobs.
#513
ruebot
closed
3 years ago
1
crawl_date is not included on binary information jobs when documentation says it is
#512
ruebot
closed
3 years ago
0
Update jsoup to 1.13.1
#511
ruebot
closed
3 years ago
1
ars-cloud compatibility with aut and Java 11
#510
ruebot
closed
3 years ago
1
Update required Scala version to 2.12
#509
ruebot
closed
3 years ago
1
Update to Spark 3.0.1
#508
ruebot
closed
3 years ago
1
Replace TravisCI with GitHub Actions.
#507
ruebot
closed
3 years ago
1
Next