issues
search
bangdv56
/
boilerpipe
Automatically exported from code.google.com/p/boilerpipe
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
word counting code does not account for & being special html symbol.
#35
GoogleCodeExporter
opened
9 years ago
0
Add 'getInstance' accessor for ImageExtractor
#34
GoogleCodeExporter
closed
9 years ago
2
Bad xml format in html output from Web API
#33
GoogleCodeExporter
opened
9 years ago
1
Documentation - How to output html extract fragement instead of text?
#32
GoogleCodeExporter
closed
9 years ago
4
Support HTML5 elements
#31
GoogleCodeExporter
opened
9 years ago
2
Outputs html instead of plain text for certain urls
#30
GoogleCodeExporter
closed
9 years ago
2
boilerpipe crash
#29
GoogleCodeExporter
closed
9 years ago
1
UTF characters are not handled correctly
#28
GoogleCodeExporter
closed
9 years ago
3
Add 1.2.0 release to maven repository
#27
GoogleCodeExporter
closed
9 years ago
1
Tags missing in output html
#26
GoogleCodeExporter
closed
9 years ago
4
Feature Request - api to return character offsets of non-boilerplate text
#25
GoogleCodeExporter
closed
9 years ago
3
Boilepipe fails (but not web api edition)
#24
GoogleCodeExporter
closed
9 years ago
4
Encoding problem (input is interpreted as Latin-1)
#23
GoogleCodeExporter
closed
9 years ago
2
Page not being parsed correctly <li> the issue.
#22
GoogleCodeExporter
closed
9 years ago
9
Included nekhtml 1.9.9 mising LostText class
#21
GoogleCodeExporter
closed
9 years ago
2
Featurerequest: Run boilerpipe as a command line tool
#20
GoogleCodeExporter
opened
9 years ago
3
Code for Google app-engine?
#19
GoogleCodeExporter
opened
9 years ago
8
Description of different extractors?
#18
GoogleCodeExporter
closed
9 years ago
3
Precursory header tags missing
#17
GoogleCodeExporter
closed
9 years ago
3
Better support for non-english pages
#16
GoogleCodeExporter
opened
9 years ago
3
Title empty when parsing with TagSoup
#15
GoogleCodeExporter
opened
9 years ago
0
boilerpipe-web: Charset encoding problem
#14
GoogleCodeExporter
closed
9 years ago
3
Missing Maven dependency
#13
GoogleCodeExporter
opened
9 years ago
11
Possible improvement to TerminatingBlocksFinder
#12
GoogleCodeExporter
closed
9 years ago
1
Unconventional operator used for boolean logic
#11
GoogleCodeExporter
closed
9 years ago
3
Links on boilerpipe homepage are broken
#10
GoogleCodeExporter
closed
9 years ago
1
Add clone method to TextBlock
#9
GoogleCodeExporter
closed
9 years ago
2
Can you fix or promote the bug fix of NekoHTML (#2909310) ?
#8
GoogleCodeExporter
closed
9 years ago
2
Exclude Script tags
#7
GoogleCodeExporter
closed
9 years ago
3
2 to 3 mins taken for a some URLs
#6
GoogleCodeExporter
closed
9 years ago
1
INSTALL.txt in src directory
#5
GoogleCodeExporter
closed
9 years ago
1
Ability to keep inline HTML in extracted content
#4
GoogleCodeExporter
closed
9 years ago
6
IDN <-> ACE Domain Names
#3
GoogleCodeExporter
closed
9 years ago
1
Encoding problem? – Strange garbage introduced
#2
GoogleCodeExporter
closed
9 years ago
4
DefaultExtractor.INSTANCE.getText(html): Removes leading special charcater when it is coded in ascii
#1
GoogleCodeExporter
closed
9 years ago
7