*What steps will reproduce the problem?
1. Process some page that contains a malformed image markup in the fashion
explained below
*What is the expected output? What do you see instead?
In the generateImageParams() method, it checks if some parameter contains a '='
character. If it's
the case, it attempts to split it with that same '=' separator. The
implementation of String#split()
specifies that trailing white spaces found after the separator will not be
placed in the resulting
array. Therefore, if the image parameter is malformed and nothing or only white
spaces is
following the '=' symbol, the resulting array contains only one element.
And when the method attempts to access the second one, it raises the above
mentioned
exception.
You can find a patch file as an attachment that solves this issue by checking
the length of the
resulting array, performing the normal treatment if equals 2 and providing an
empty string as a
second argument otherwise.
*What version of the product are you using? On what operating system?
We use the latest version checked out from the SVN repository. We use it on MAC
OSX Snow
Leopard with the 1.6 JVM.
*Please provide any additional information below.
The input data is an extract of the french Wikipedia export collected through
MWDumper. We use
the WEM component of your project in order to generate a CAS data structure to
be supplied to
the Apache UIMA framework.
Original issue reported on code.google.com by Maxime.B...@gmail.com on 4 Jun 2010 at 3:16
Original issue reported on code.google.com by
Maxime.B...@gmail.com
on 4 Jun 2010 at 3:16Attachments: