issues
search
ageitgey
/
node-unfluff
Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k
stars
221
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Frontend support
#64
johipsum
closed
7 years ago
1
Replace "null" to empty string
#63
tykarol
closed
7 years ago
5
Convert to front-end friendly, remove 'fs'
#62
knod
opened
7 years ago
13
try to take div itemprop="articleBody" into account
#61
hecmec
opened
7 years ago
0
Doesn't seem to work for sites that use <div> tags instead of <p>
#60
iannshan
opened
7 years ago
2
Add stopwords for Bulgarian Language
#59
kolarski
closed
8 years ago
0
Deprecated modules
#58
riyaznet
closed
6 years ago
2
Add links array to extraction
#57
joshbeckman
closed
8 years ago
6
How can manage this case ?
#56
christophebe
closed
7 years ago
1
Extract text with line breaks
#55
adrianparr
closed
8 years ago
1
What coffee does unfluff drink?
#54
bennyk
closed
7 years ago
1
How can I restore the content with the extracted metadata
#53
goddyZhao
closed
8 years ago
2
do you support open graph
#52
yawhide
closed
8 years ago
4
Moves metadata selectors to external file
#51
philgooch
closed
8 years ago
0
Incorrect video extractions
#50
snellingio
closed
8 years ago
3
Adds author, date, copyright extraction
#49
philgooch
closed
8 years ago
3
Extract author
#48
PetrKaleta
closed
8 years ago
2
Filter the empty string out from the stopwords array
#47
maxme
closed
8 years ago
0
why this didn't support Chinese, what's difficult part ?
#46
Pana
closed
9 years ago
1
Add czech stopwords file
#45
burningtree
closed
9 years ago
1
calculateBestNode claims no nodesWithText on facebook developer page
#44
cmkimerer
closed
9 years ago
1
Missed js update
#43
ageitgey
closed
9 years ago
0
Support og description tags
#42
ageitgey
closed
9 years ago
0
Fix a bug where unrelated words were joined together that were…
#41
ageitgey
closed
9 years ago
0
Description should try to get meta[property="og:description"]
#40
joanamelo
closed
9 years ago
2
Fix issue where an SVG title in the page will get concatenated with t…
#39
bradvogel
closed
9 years ago
1
Text missing
#38
akreienbring
closed
9 years ago
5
Fix #34 - trim whitespace from tags found in page content
#37
ageitgey
closed
9 years ago
0
Updated Portuguese stopwords file
#36
lquadrosl
closed
9 years ago
2
Grammar fix
#35
falkirks
closed
9 years ago
1
Trim whitespace from tags?
#34
pdehaan
closed
9 years ago
1
Display --help output if no arguments passed to unfluff CLI
#33
pdehaan
closed
9 years ago
1
Typo in extractor#isHighlinkDensity ?
#32
dminkovsky
closed
9 years ago
1
Extract not all text
#31
yanosh-igor
closed
9 years ago
3
implement domainExtractor for image and title, with a single implementation wikipedia
#30
danielgranat
opened
9 years ago
1
Twitter status (tweet) as article?
#29
mattpal
closed
9 years ago
1
Fix an issue with USA today stories cleaning poorly
#28
ageitgey
closed
10 years ago
0
Ignore Social Buttons
#27
timcosta
closed
10 years ago
2
Don't drop uls and format them nicely in the output
#26
ageitgey
closed
10 years ago
1
Keeping unordered/ordered lists in extracted text
#25
joshbeckman
closed
10 years ago
1
Ignores unordered/ordered lists in body
#24
joshbeckman
closed
10 years ago
7
Upgrade Cheerio
#23
bradvogel
closed
10 years ago
4
Typo fix.
#22
bradvogel
closed
10 years ago
1
Fixed side effect from invocation of cleaner in unfluff.lazy
#21
franza
opened
10 years ago
3
Title fix
#20
bradvogel
closed
10 years ago
1
Title should prefer meta[property="og:title"]
#19
bradvogel
closed
10 years ago
2
Fixed example of usage in README.md
#18
franza
closed
10 years ago
1
Refactor to split out the unfluff interface from the extractor code
#17
ageitgey
closed
10 years ago
0
Issue #13 - Get parts of content independently
#16
franza
closed
10 years ago
6
Add Thai language stop words
#15
thangman22
closed
10 years ago
2
Previous
Next