issues
search
Tjatse
/
node-readability
Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.
342
stars
36
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
how to handle redirect
#51
farhad-arjmand
closed
4 years ago
1
regexps for images - add webp and remove $
#50
natocTo
opened
6 years ago
2
Youtube embedded videos are removed
#49
anthony-foulfoin
opened
6 years ago
1
h1 h2 h3 tags are removed
#48
anthony-foulfoin
opened
6 years ago
1
Handling child text nodes of div
#47
RaviBolla
opened
7 years ago
1
Handling child text nodes of div
#46
RaviBolla
closed
7 years ago
0
what is download size limit of the data readability can scrape?
#45
sferoze
closed
6 years ago
1
Handling child text nodes of div
#44
RaviBolla
closed
7 years ago
0
Text display broken for economictimes articles
#43
RaviBolla
opened
7 years ago
0
Having trouble reading NYTimes
#42
leonpanjtar
closed
6 years ago
0
抓豆瓣不是很准
#41
rupertqin
closed
6 years ago
2
regex ignored?
#40
Leidanya
closed
7 years ago
2
not perfect when crawling this url
#39
doctorxiao
closed
7 years ago
1
Improve documentation
#38
midudev
closed
7 years ago
4
[Discussion] About using node.score instead node.data() method
#37
midudev
closed
2 years ago
3
Cleaning reader
#36
midudev
closed
7 years ago
0
Improving readability of the code and refactor methods
#35
midudev
closed
6 years ago
1
Update packages, add nodeJS 4 and 6 for Travis and remove unnecesary escape characters
#34
midudev
closed
7 years ago
0
Consider adding promises
#33
midudev
closed
8 years ago
7
[feature] customize before, after operations
#32
Tjatse
opened
8 years ago
0
Fix positive pattern, remove incorrect char.
#31
entertainyou
closed
8 years ago
1
Do not fallback to parent when it's article
#30
entertainyou
closed
8 years ago
1
Add some debug log.
#29
entertainyou
closed
8 years ago
1
Consider img as article parts too, add score for img elements.
#28
entertainyou
closed
8 years ago
0
Update dependency
#27
entertainyou
closed
8 years ago
0
Output some logs to understand what's going on?
#26
entertainyou
closed
8 years ago
1
Extra score for items with articleBody itemprop (no matter if article or div)
#25
midudev
closed
2 years ago
3
Use JavaScript Standard Style [enhancement]
#24
midudev
closed
8 years ago
4
Some titles are broken
#23
midudev
closed
8 years ago
3
Redirect correctly pages from feedproxy to final url
#22
midudev
closed
6 years ago
0
Append if the direct child is of type img, object and embed.
#21
entertainyou
closed
8 years ago
1
Call fix link when use selectors to extract content.
#20
entertainyou
closed
8 years ago
0
forceDecode option
#19
gnujeremie
closed
6 years ago
0
Make imgFallback if provided as a function can override img src value.
#18
entertainyou
closed
8 years ago
5
Crash when input it's a binary APK file.
#17
entertainyou
closed
7 years ago
0
Filter children of `topCandidate` if choose the parent of it as Article Object.
#16
Tjatse
closed
8 years ago
0
Incorrect and maybe dangerous result
#15
entertainyou
closed
8 years ago
11
Make images regexp extendable.
#14
entertainyou
closed
8 years ago
0
Add a option imgFallback which use img element's data-src attribute when
#13
entertainyou
closed
8 years ago
4
The result is not perfect on http://news.sohu.com/20151228/n432833902.shtml
#12
entertainyou
closed
8 years ago
5
[feature] customize title, content selectors
#11
Tjatse
closed
8 years ago
1
<blockquote> is stripped off
#10
mrgodhani
closed
8 years ago
1
Sync
#9
Tjatse
closed
8 years ago
0
Merge pull request #2 from Tjatse/master
#8
Tjatse
closed
8 years ago
0
Add cheerio output type to direct return the cheerio object.
#7
entertainyou
closed
8 years ago
1
tidyAttrs also removes src from images
#6
midudev
closed
8 years ago
2
Links are removed if <p> is missing
#5
midudev
closed
8 years ago
4
Detect and use charset if not set
#4
midudev
closed
8 years ago
3
Remove inline styles
#3
midudev
closed
8 years ago
7
Merge from master
#2
Tjatse
closed
9 years ago
0
Next