alan-turing-institute ReadabiliPy issues

alan-turing-institute / ReadabiliPy

A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.

MIT License

230 stars 36 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add benchmarking

#61 edwardchalstrey1 closed 5 years ago
0
Update publication date extraction

#60 edwardchalstrey1 closed 5 years ago
2
Title updates

#59 edwardchalstrey1 closed 5 years ago
0
Date extraction

#58 edwardchalstrey1 closed 5 years ago
17
Title extraction

#57 edwardchalstrey1 closed 5 years ago
3
extruct for structured data

#56 westurner opened 5 years ago
1
Unnecessary <div> elements

#55 jemrobinson opened 5 years ago
1
Update BeautifulSoup version

#54 jemrobinson closed 5 years ago
0
Use correct name for beautifulsoup

#53 jemrobinson closed 5 years ago
0
BeautifulSoup hanging on find_all

#52 jemrobinson closed 5 years ago
0
Replace single <br> with space

#51 jemrobinson closed 5 years ago
0
Clarify rule for single <br>

#50 jemrobinson closed 5 years ago
0
Fix CData behaviour and improve test coverage

#49 jemrobinson closed 5 years ago
0
Update README and restructure

#48 jemrobinson closed 5 years ago
1
New method of whitespace joining

#47 jemrobinson closed 5 years ago
2
Dealing with white space

#46 jemrobinson closed 5 years ago
0
Deal with tags inside words

#45 jemrobinson closed 5 years ago
0
Added additional comment type

#44 jemrobinson closed 5 years ago
0
Add use Readability option to commandline tool and README

#43 martintoreilly closed 5 years ago
4
Fix comments inside tags

#42 jemrobinson closed 5 years ago
0
Comments inside tags

#41 jemrobinson closed 5 years ago
1
Fix erroneous whitespace

#40 jemrobinson closed 5 years ago
1
Erroneous whitespace

#39 jemrobinson closed 5 years ago
0
Fix rogue unescaped span

#38 jemrobinson closed 5 years ago
0
ReadabiliPy has not removed a span element from plain content and plain text

#37 sgibson91 closed 5 years ago
0
ImportError: No module named 'ReadabiliPy'

#36 kochkinaelena closed 5 years ago
8
Fix extra div element wrapping

#35 jemrobinson closed 5 years ago
0
Extra div element wrapping

#34 sgibson91 closed 5 years ago
0
Define explicit handling rules for HTML 4 elements

#33 martintoreilly opened 5 years ago
0
How should CDATA be dealt with?

#32 jemrobinson closed 5 years ago
2
Define handling rules for <iframe>

#31 jemrobinson opened 5 years ago
2
Non-HTML5 element

#30 jemrobinson closed 5 years ago
0
FileNotFoundError: [WinError 2] at parse

#29 orange391224 closed 5 years ago
3
Replaced readability with pure-python parser

#28 jemrobinson closed 5 years ago
1
Add travis support

#27 jemrobinson closed 5 years ago
2
[ABANDONED] Reverted history rewrite

#26 jemrobinson closed 5 years ago
2
Add Travis support

#25 jemrobinson closed 5 years ago
0
Updated Readability.js

#24 jemrobinson closed 5 years ago
0
Inconsistent output when changing from Node 10.13 to Node 11.1

#23 jemrobinson closed 5 years ago
0
Add unit tests for HTML elements

#22 jemrobinson closed 5 years ago
1
Make plain-content generation more robust

#21 jemrobinson closed 5 years ago
0
Update README to correct errors

#20 martintoreilly closed 6 years ago
0
Add node index to plain_text output when generated for plain_content

#19 martintoreilly closed 6 years ago
0
Add option to tag plain_content with node indexes

#18 martintoreilly closed 6 years ago
0
Fix test for case when no article returned to include null plain_text

#17 martintoreilly closed 6 years ago
0
Ensure plain_text field always returned

#16 martintoreilly closed 6 years ago
0
Revise plain content extraction to handle lists

#15 martintoreilly closed 6 years ago
0
Revise plain content extraction to handle lists

#14 martintoreilly closed 6 years ago
2
Add command line script

#13 martintoreilly closed 6 years ago
0
Add python command line script

#12 martintoreilly closed 6 years ago
1

Previous Next