issues
search
ageitgey
/
node-unfluff
Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k
stars
221
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Handle bad languages by falling back to english and printing a warning
#14
ageitgey
closed
10 years ago
1
Get parts of content independently
#13
bradvogel
closed
10 years ago
7
Handle case where stopword file !exists
#12
mhuebert
closed
10 years ago
3
Turkish stopwords
#11
c0b41
closed
10 years ago
2
Fix for github pages with code blocks
#10
ageitgey
closed
10 years ago
0
Fix #8 - text getting dropped in wikipedia articles
#9
ageitgey
closed
10 years ago
1
Handle Asian scripts better
#8
thethomaseffect
closed
10 years ago
5
shouldn't remove code blocks
#7
joeybaker
closed
10 years ago
5
Fix pages with junk line breaks
#6
ageitgey
closed
10 years ago
0
Consecute newlines in HTML text should be converted to spaces instead of '\n\n'
#5
JohnAllen
closed
10 years ago
4
Add basic image extraction from meta tags
#4
ageitgey
closed
10 years ago
0
Adding changelog
#3
ageitgey
closed
10 years ago
0
Add support for extracting embedded videos from web pages
#2
ageitgey
closed
10 years ago
0
Include image url extraction?
#1
kelvinkoko
closed
10 years ago
3
Previous