ageitgey node-unfluff issues

ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document

Apache License 2.0

2.15k stars 221 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Frontend support

#64 johipsum closed 7 years ago
1
Replace "null" to empty string

#63 tykarol closed 7 years ago
5
Convert to front-end friendly, remove 'fs'

#62 knod opened 7 years ago
13
try to take div itemprop="articleBody" into account

#61 hecmec opened 7 years ago
0
Doesn't seem to work for sites that use <div> tags instead of <p>

#60 iannshan opened 7 years ago
2
Add stopwords for Bulgarian Language

#59 kolarski closed 8 years ago
0
Deprecated modules

#58 riyaznet closed 6 years ago
2
Add links array to extraction

#57 joshbeckman closed 8 years ago
6
How can manage this case ?

#56 christophebe closed 7 years ago
1
Extract text with line breaks

#55 adrianparr closed 8 years ago
1
What coffee does unfluff drink?

#54 bennyk closed 7 years ago
1
How can I restore the content with the extracted metadata

#53 goddyZhao closed 8 years ago
2
do you support open graph

#52 yawhide closed 8 years ago
4
Moves metadata selectors to external file

#51 philgooch closed 8 years ago
0
Incorrect video extractions

#50 snellingio closed 8 years ago
3
Adds author, date, copyright extraction

#49 philgooch closed 8 years ago
3
Extract author

#48 PetrKaleta closed 8 years ago
2
Filter the empty string out from the stopwords array

#47 maxme closed 8 years ago
0
why this didn't support Chinese, what's difficult part ?

#46 Pana closed 9 years ago
1
Add czech stopwords file

#45 burningtree closed 9 years ago
1
calculateBestNode claims no nodesWithText on facebook developer page

#44 cmkimerer closed 9 years ago
1
Missed js update

#43 ageitgey closed 9 years ago
0
Support og description tags

#42 ageitgey closed 9 years ago
0
Fix a bug where unrelated words were joined together that were…

#41 ageitgey closed 9 years ago
0
Description should try to get meta[property="og:description"]

#40 joanamelo closed 9 years ago
2
Fix issue where an SVG title in the page will get concatenated with t…

#39 bradvogel closed 9 years ago
1
Text missing

#38 akreienbring closed 9 years ago
5
Fix #34 - trim whitespace from tags found in page content

#37 ageitgey closed 9 years ago
0
Updated Portuguese stopwords file

#36 lquadrosl closed 9 years ago
2
Grammar fix

#35 falkirks closed 9 years ago
1
Trim whitespace from tags?

#34 pdehaan closed 9 years ago
1
Display --help output if no arguments passed to unfluff CLI

#33 pdehaan closed 9 years ago
1
Typo in extractor#isHighlinkDensity ?

#32 dminkovsky closed 9 years ago
1
Extract not all text

#31 yanosh-igor closed 9 years ago
3
implement domainExtractor for image and title, with a single implementation wikipedia

#30 danielgranat opened 9 years ago
1
Twitter status (tweet) as article?

#29 mattpal closed 9 years ago
1
Fix an issue with USA today stories cleaning poorly

#28 ageitgey closed 10 years ago
0
Ignore Social Buttons

#27 timcosta closed 10 years ago
2
Don't drop uls and format them nicely in the output

#26 ageitgey closed 10 years ago
1
Keeping unordered/ordered lists in extracted text

#25 joshbeckman closed 10 years ago
1
Ignores unordered/ordered lists in body

#24 joshbeckman closed 10 years ago
7
Upgrade Cheerio

#23 bradvogel closed 10 years ago
4
Typo fix.

#22 bradvogel closed 10 years ago
1
Fixed side effect from invocation of cleaner in unfluff.lazy

#21 franza opened 10 years ago
3
Title fix

#20 bradvogel closed 10 years ago
1
Title should prefer meta[property="og:title"]

#19 bradvogel closed 10 years ago
2
Fixed example of usage in README.md

#18 franza closed 10 years ago
1
Refactor to split out the unfluff interface from the extractor code

#17 ageitgey closed 10 years ago
0
Issue #13 - Get parts of content independently

#16 franza closed 10 years ago
6
Add Thai language stop words

#15 thangman22 closed 10 years ago
2

Previous Next