issues
search
alan-turing-institute
/
ReadabiliPy
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
MIT License
230
stars
36
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add benchmarking
#61
edwardchalstrey1
closed
5 years ago
0
Update publication date extraction
#60
edwardchalstrey1
closed
5 years ago
2
Title updates
#59
edwardchalstrey1
closed
5 years ago
0
Date extraction
#58
edwardchalstrey1
closed
5 years ago
17
Title extraction
#57
edwardchalstrey1
closed
5 years ago
3
extruct for structured data
#56
westurner
opened
5 years ago
1
Unnecessary <div> elements
#55
jemrobinson
opened
5 years ago
1
Update BeautifulSoup version
#54
jemrobinson
closed
5 years ago
0
Use correct name for beautifulsoup
#53
jemrobinson
closed
5 years ago
0
BeautifulSoup hanging on find_all
#52
jemrobinson
closed
5 years ago
0
Replace single <br> with space
#51
jemrobinson
closed
5 years ago
0
Clarify rule for single <br>
#50
jemrobinson
closed
5 years ago
0
Fix CData behaviour and improve test coverage
#49
jemrobinson
closed
5 years ago
0
Update README and restructure
#48
jemrobinson
closed
5 years ago
1
New method of whitespace joining
#47
jemrobinson
closed
5 years ago
2
Dealing with white space
#46
jemrobinson
closed
5 years ago
0
Deal with tags inside words
#45
jemrobinson
closed
5 years ago
0
Added additional comment type
#44
jemrobinson
closed
5 years ago
0
Add use Readability option to commandline tool and README
#43
martintoreilly
closed
5 years ago
4
Fix comments inside tags
#42
jemrobinson
closed
5 years ago
0
Comments inside tags
#41
jemrobinson
closed
5 years ago
1
Fix erroneous whitespace
#40
jemrobinson
closed
5 years ago
1
Erroneous whitespace
#39
jemrobinson
closed
5 years ago
0
Fix rogue unescaped span
#38
jemrobinson
closed
5 years ago
0
ReadabiliPy has not removed a span element from plain content and plain text
#37
sgibson91
closed
5 years ago
0
ImportError: No module named 'ReadabiliPy'
#36
kochkinaelena
closed
5 years ago
8
Fix extra div element wrapping
#35
jemrobinson
closed
5 years ago
0
Extra div element wrapping
#34
sgibson91
closed
5 years ago
0
Define explicit handling rules for HTML 4 elements
#33
martintoreilly
opened
5 years ago
0
How should CDATA be dealt with?
#32
jemrobinson
closed
5 years ago
2
Define handling rules for <iframe>
#31
jemrobinson
opened
5 years ago
2
Non-HTML5 element
#30
jemrobinson
closed
5 years ago
0
FileNotFoundError: [WinError 2] at parse
#29
orange391224
closed
5 years ago
3
Replaced readability with pure-python parser
#28
jemrobinson
closed
5 years ago
1
Add travis support
#27
jemrobinson
closed
5 years ago
2
[ABANDONED] Reverted history rewrite
#26
jemrobinson
closed
5 years ago
2
Add Travis support
#25
jemrobinson
closed
5 years ago
0
Updated Readability.js
#24
jemrobinson
closed
5 years ago
0
Inconsistent output when changing from Node 10.13 to Node 11.1
#23
jemrobinson
closed
5 years ago
0
Add unit tests for HTML elements
#22
jemrobinson
closed
5 years ago
1
Make plain-content generation more robust
#21
jemrobinson
closed
5 years ago
0
Update README to correct errors
#20
martintoreilly
closed
6 years ago
0
Add node index to plain_text output when generated for plain_content
#19
martintoreilly
closed
6 years ago
0
Add option to tag plain_content with node indexes
#18
martintoreilly
closed
6 years ago
0
Fix test for case when no article returned to include null plain_text
#17
martintoreilly
closed
6 years ago
0
Ensure plain_text field always returned
#16
martintoreilly
closed
6 years ago
0
Revise plain content extraction to handle lists
#15
martintoreilly
closed
6 years ago
0
Revise plain content extraction to handle lists
#14
martintoreilly
closed
6 years ago
2
Add command line script
#13
martintoreilly
closed
6 years ago
0
Add python command line script
#12
martintoreilly
closed
6 years ago
1
Previous
Next