dfinlen / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

Hotpatched nekohtml classes cause library incompatibilities #62

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Attempt to use boilerplate & RichFaces 3.x together.

What is the expected output? What do you see instead?

First problem you're likely to encounter is NoClassDefFoundError of the kind:
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError:
org/cyberneko/html/LostText when you attempt to start your webapp... 

What version of the product are you using? On what operating system?

Using Apache-Tika 0.9 which includes boilerpipe 1.1.0 as a transitive 
dependency. RichFaces 3.1 uses nekohtml 0.9.5, Richfaces 3.3.3 depends on 
nekohtml 1.9.6 (tried both with boilerplate, both fail). 
Trying on Linux RedHat 5 and Windows 7, not that it's relevant here. 

Please provide any additional information below.

Core problem is one of conflicting dependencies. RichFaces 3.x depends on older 
versions of nekohtml than the hot patched 1.9.13 that's bundled into 
boilerplate 1.1.0 (and 1.2.0 it seems). I've seen you've attempted to include 
fixes into nekohtml without much traction. 

But the bundling of patched classes is causing a core incompatibility with 
RichFaces, or anything else depending on nekohtml. In issue#13 you suggested 
splitting the nekohtml hotfix out to another jar, what happened with this?

Original issue reported on code.google.com by osb...@gmail.com on 19 Mar 2013 at 11:20

GoogleCodeExporter commented 9 years ago
I'm wondering as a workaround can we have a unpatched boilerpipe jar and 
configure boilerpipe to use tagsoup in place of nekohtml? struggling to find 
documentation on how this could be achieved. 

Also wondering if 1.2.0 is API compatible with 1.1.0? 1.2.0 seems to have had 
some tagsoup patches? I'm new to boilerpipe, it's suddenly become a transitive 
dependency on a project that causes stopship issues. This incompatibility 
completely breaks RichFaces components and its dependency on nekohtml versions 
cannot be altered (I've tried and failed).

Original comment by osb...@gmail.com on 19 Mar 2013 at 11:25

GoogleCodeExporter commented 9 years ago
Also filed here: https://code.google.com/p/boilerpipe/issues/detail?id=81

Original comment by benjamin...@gmail.com on 19 Nov 2014 at 9:38

GoogleCodeExporter commented 9 years ago
The hot patch is a workaround for this issue: 
http://sourceforge.net/p/nekohtml/bugs/98/

Original comment by benjamin...@gmail.com on 19 Nov 2014 at 9:39

GoogleCodeExporter commented 9 years ago
What you should do when you want to hot patch a library is to repackage the 
entire library source using a different package name. There are build tools 
that can help you do this. Please fix this issue because it can cause very hard 
to explain errors. As a user's classpath ordering changes his project will 
unexplicably break and unbreak.

Original comment by benjamin...@gmail.com on 19 Nov 2014 at 9:41

GoogleCodeExporter commented 9 years ago
Seems this library is no longer maintained... can we fix this by contributing 
to this project?

Original comment by jacky...@gmail.com on 20 Nov 2014 at 1:58

GoogleCodeExporter commented 9 years ago
Please raise this issue at nekohtml.

The project is not dead. Given the limitations of Google Code and SVN (and the 
limitations of my spare time) I just can't/won't fix it for every use case.

A port to Github is underway, with changes eventually that will solve your 
issues.

Original comment by ckkohl79 on 20 Nov 2014 at 2:47