ageitgey node-unfluff issues - Githubissues

ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document

Apache License 2.0

2.15k stars 221 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Handle bad languages by falling back to english and printing a warning

#14 ageitgey closed 10 years ago
1
Get parts of content independently

#13 bradvogel closed 10 years ago
7
Handle case where stopword file !exists

#12 mhuebert closed 10 years ago
3
Turkish stopwords

#11 c0b41 closed 10 years ago
2
Fix for github pages with code blocks

#10 ageitgey closed 10 years ago
0
Fix #8 - text getting dropped in wikipedia articles

#9 ageitgey closed 10 years ago
1
Handle Asian scripts better

#8 thethomaseffect closed 10 years ago
5
shouldn't remove code blocks

#7 joeybaker closed 10 years ago
5
Fix pages with junk line breaks

#6 ageitgey closed 10 years ago
0
Consecute newlines in HTML text should be converted to spaces instead of '\n\n'

#5 JohnAllen closed 10 years ago
4
Add basic image extraction from meta tags

#4 ageitgey closed 10 years ago
0
Adding changelog

#3 ageitgey closed 10 years ago
0
Add support for extracting embedded videos from web pages

#2 ageitgey closed 10 years ago
0
Include image url extraction?

#1 kelvinkoko closed 10 years ago
3

Previous