issues
search
archivesunleashed
/
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137
stars
33
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Migrate CI infrastructure from TravisCI to GitHub Action
#506
ruebot
closed
3 years ago
0
Bump junit from 4.12 to 4.13.1
#505
dependabot[bot]
closed
3 years ago
1
Fix relative links extraction
#504
yxzhu16
closed
3 years ago
1
Remove .keepValidPages() on .all() Python implmentation.
#503
ruebot
closed
3 years ago
0
Python implementation of .all() has .keepValidPages() incorrectly applied to it
#502
ruebot
closed
3 years ago
0
Extract hyperlinks from wayback machine
#501
yxzhu16
closed
3 years ago
3
Updates read.me to include citation section
#500
SamFritz
closed
3 years ago
3
Remove tf project; resolves #498.
#499
ruebot
closed
3 years ago
1
Split tf into it's own repo
#498
ruebot
closed
3 years ago
1
Update Read.me w/ citation information
#497
SamFritz
closed
3 years ago
4
Set the upper limit of WARC content length to half of Integer.MAX_VALUE
#496
adamyy
closed
4 years ago
1
Release 0.80.0 JAR produces error; built 0.80.1 fatjar built on repo works
#495
ianmilligan1
closed
4 years ago
2
Replace Java ARC/WARC record processing library
#494
ruebot
closed
2 years ago
1
Extract gzip data from transfer-encoded WARC
#493
ianmilligan1
closed
2 years ago
1
ARC reader string vs int error on record length
#492
ruebot
closed
2 years ago
2
Hadoop 3.2 support
#491
ruebot
closed
2 years ago
2
Change master branch to main branch
#490
ruebot
closed
4 years ago
4
Add Python formatter GitHub Action.
#489
ruebot
closed
4 years ago
1
GitHub action - Run isort and black on Python code
#488
ruebot
closed
4 years ago
0
Add scalafmt GitHub action and apply it to scala code.
#487
ruebot
closed
4 years ago
1
Add scalafmt GitHub action
#486
ruebot
closed
4 years ago
0
Add Google Java Formatter as an action, and apply it.
#485
ruebot
closed
4 years ago
1
Add Google Java Formatter as a GitHub action
#484
ruebot
closed
4 years ago
0
Packages build is often broken - should we support it?
#483
ruebot
closed
4 years ago
5
Add Python implementation of SaveBytes.
#482
ruebot
closed
4 years ago
2
Bump xercesImpl from 2.11.0 to 2.12.0
#481
dependabot[bot]
closed
4 years ago
1
[Skip Travis] Trim README down given aut.docs.archivesunleashed.org
#480
ruebot
closed
4 years ago
1
Remove RDD suffixes on file, class, and object names.
#479
ruebot
closed
4 years ago
2
Implement SaveToDisk in Python
#478
ruebot
closed
4 years ago
1
PEP8 Python app method names.
#477
ruebot
closed
4 years ago
1
Broken link in documentation
#476
sepastian
closed
4 years ago
6
Move Python UDF methods out of their own class.
#475
ruebot
closed
4 years ago
1
Add DataFrame udf tests.
#474
ruebot
closed
4 years ago
3
Improve udfs/package.scala test coverage
#473
ruebot
closed
4 years ago
0
Remove tabDelimit.
#472
ruebot
closed
4 years ago
1
Remove tabDelimit
#471
ianmilligan1
closed
4 years ago
2
Remove NER functionality.
#470
ruebot
closed
4 years ago
2
Remove Extract Entities
#469
ruebot
closed
4 years ago
0
PEP8 Naming - UDFs, App method names, DataFrame names, and filters.
#468
ruebot
closed
4 years ago
10
Python UDFs - class or not?
#467
ruebot
closed
4 years ago
5
Add ExtractPopularImages, WriteGEXF, and WriteGraphML to Python.
#466
ruebot
closed
4 years ago
4
Remove ExtractImageDetailsDF; resolves #464.
#465
ruebot
closed
4 years ago
2
Remove ExtractImageDetailsDF.scala
#464
ruebot
closed
4 years ago
1
Implement Scala Matchbox UDFs in Python.
#463
ruebot
closed
4 years ago
2
Import clean-up for df package.
#462
ruebot
closed
4 years ago
1
github-stite-deploy uses password based authentication which is being deprecated by GitHub
#461
ruebot
closed
4 years ago
1
[skip travis] README updates
#460
ruebot
closed
4 years ago
0
Set spark-submit app name to be "aut - extractorName".
#459
ruebot
closed
4 years ago
1
For extractor (spark-submit) job, set Spark app name to be the extractor job name.
#458
ruebot
closed
4 years ago
0
Add RemovePrefixWWWDF to DomainFrequencyExtractor.
#457
ruebot
closed
4 years ago
1
Previous
Next