issues
search
chatnoir-eu
/
chatnoir-resiliparse
A robust web archive analytics toolkit
https://resiliparse.chatnoir.eu
Apache License 2.0
54
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Old
#42
timmytwoteeth
closed
1 month ago
3
DOM Tree Manipulation and DOMNode
#41
rosinality
closed
2 months ago
6
steady memory grouth while working on web pages
#40
prnake
closed
2 weeks ago
5
svg caused lexbor to crash
#39
prnake
closed
3 months ago
2
Move to gcc-toolset-12-libasan-devel in Dockerfile
#38
jonded94
closed
3 months ago
0
Implement docker arm build (necessary for Pull Request #34)
#37
jonded94
closed
3 months ago
0
setuptools.config.pyprojecttoml has no attribute _BetaConfiguration
#36
geoHeil
closed
3 months ago
3
python3.7 can use this package?
#35
klaus2015
closed
3 months ago
1
Enable build of linux-aarch64 wheels
#34
jonded94
closed
3 months ago
9
Build wheels with py3.12 to reduce build time
#33
cclauss
closed
7 months ago
2
Upgrade GitHub Action checkout to v4
#32
cclauss
closed
7 months ago
1
can not install on python 3.11 ubuntu docker
#31
bytearchive
closed
9 months ago
4
Type annotations
#30
heinrichreimer
opened
10 months ago
1
Random or Chunked Reading
#29
quasiben
closed
10 months ago
3
Resiliparse does not Compile under Ubuntu 18
#28
mam10eks
closed
1 year ago
3
fatal error: html.h: No such file or directory
#27
davidtbo
closed
3 months ago
3
CLI index: fix exception when determining the length of the last WARC record
#26
sebastian-nagel
closed
1 year ago
3
Trouble building in Python 3.11
#25
getorca
closed
1 year ago
6
resiliparse crashes in colab
#24
huu4ontocord
closed
1 year ago
8
Interesting Benchmarks running resilparse 'HTML2text' sequentially vs parallel
#23
getorca
closed
1 year ago
28
Fix FastWARC bib entry
#22
niklasdeckers
closed
2 years ago
0
Fix ArchiveIterator min_content_length behavior
#21
jmfrees
closed
2 years ago
4
Fix Windows CI
#20
phoerious
closed
2 years ago
1
pipx run fastwarc check faild: binascii.Error: Non-base32 digit found
#19
MaxPeal
closed
2 years ago
9
pipx run resiliparse faild: ModuleNotFoundError: No module named 'joblib'
#18
MaxPeal
closed
2 years ago
5
Build ASAN config
#17
phoerious
closed
2 years ago
0
Installing fastwarc via `pip install` fails if compilation is required or requested
#16
sebastian-nagel
closed
2 years ago
3
Fix Windows DLL bundling
#15
phoerious
closed
2 years ago
0
yum install
#14
maximedb
closed
2 years ago
3
Fastwarc: CLI may index gzipped WARC records with erroneous length 0
#13
sebastian-nagel
closed
2 years ago
3
Fix code coverage reporting
#12
phoerious
closed
2 years ago
1
FastWARC: command-line tools to index and extract WARC records
#11
sebastian-nagel
closed
2 years ago
8
CI Test
#10
phoerious
closed
2 years ago
0
Bug repro for https://github.com/pypa/cibuildwheel/issues/816
#9
phoerious
closed
2 years ago
0
New HTML parsing module
#8
phoerious
closed
2 years ago
0
Correct status code parsing
#7
Querela
closed
2 years ago
1
FastWARC: BufferedReader may hang up on truncated gzipped WARC file
#6
sebastian-nagel
closed
2 years ago
6
Fix typo in write() causing failure if unparsed data is written
#5
sebastian-nagel
closed
2 years ago
0
Fix status code parsing
#4
Querela
closed
2 years ago
0
Fix HTTP status code parsing (reason phrase may contain spaces)
#3
Querela
closed
2 years ago
2
Caching of HTTP Headers
#2
mam10eks
closed
2 years ago
0
Problem with LZ4F_cctx and LZ4G_dctx
#1
rvbarreto
closed
2 years ago
1