issues
search
chimbori
/
crux
Crux offers a flexible plugin-based API & implementation to extract interesting information from Web pages.
Apache License 2.0
239
stars
43
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
KMP
#30
LaatonWalaBhoot
opened
8 months ago
1
Multiplatform support
#29
LaatonWalaBhoot
opened
10 months ago
6
Add plugin for rewriting relative URIs to absolute URIs in image tags
#28
kjeller
closed
1 year ago
8
Add Java sample code to README
#27
sigpwned
closed
1 year ago
1
Reordered default plugins so HtmlMetadataExtractor overrides WebAppManifestParser.
#26
ciferkey
closed
2 years ago
1
Crux replaces page title with site title.
#25
ciferkey
opened
2 years ago
1
Unable to write custom plugin since Plugin interface is sealed
#24
evuki
closed
2 years ago
2
Added created and modified dates to metadata extractor.
#23
ciferkey
closed
2 years ago
6
Why not use Uri.parse or String instead of OkHttp?
#22
piaci
closed
3 years ago
5
Use default values with Article data class
#21
EmpowrCo
closed
3 years ago
4
Article publication date
#20
kowalkr
opened
3 years ago
0
New Release?
#19
raharrison
closed
4 years ago
3
page the fails badly
#18
johngray1965
opened
4 years ago
1
Update gradle dependency name in README.md
#17
XiangRongLin
closed
4 years ago
1
Improve crux-keep performance
#16
sigpwned
closed
5 years ago
4
Mark certain nodes to be kept: `crux-keep`
#15
chimbori
opened
5 years ago
1
Add hint to keep specific elements
#14
sigpwned
closed
5 years ago
4
[NEWYORKER] Span tags are replaced with a paragraph tag
#13
PawanHegde
opened
5 years ago
1
images, videos, iframes
#12
piaci
opened
5 years ago
1
Exception in thread "main" java.lang.NoClassDefFoundError: com/chimbori/crux/articles/ArticleExtractor
#11
Tony1952466760
closed
2 years ago
1
Images now listed in Crux output
#10
platelminto
closed
5 years ago
1
[NYT] Content after ad is not extracted
#9
anhtuan23
opened
5 years ago
0
Don't strip forms in pre-processing
#8
GomiGuchi
closed
5 years ago
0
Don't strip forms in pre-processing
#7
GomiGuchi
closed
5 years ago
2
Hidden popup chosen as article
#6
GomiGuchi
closed
5 years ago
5
Missing content for NYT recipe
#5
8enmann
opened
5 years ago
0
Preserve <br>?
#4
8enmann
opened
5 years ago
4
the extracted content output not contain picture elements
#3
wizos
opened
6 years ago
8
Release to Maven Central
#2
sigpwned
closed
5 years ago
11
add some missing properties
#1
sigpwned
closed
5 years ago
4