Closed retrek closed 8 years ago
I have the same issue with my project using a plain media wiki. I traced the problem further: It originates in XmlConverter.getRootElementWithErrorOpt(...) at line 71 (version 3.1.0), where
try{
Document doc = builder.build(new ByteArrayInputStream(xml.getBytes(Charsets.UTF_8)));
root = doc.getRootElement();
}...
fails, but the underlying exception is swallowed and not logged or reported. The xml-String being passed as argument to getRootElementWithErrorOpt is never trimmed! However the string passed starts with multiple line beaks before any non-empty lines follow. maybe the SAXBuilder reads the empty lines as #text-elements and messes up finding the root. Also due to this <?xml version="1.0" encoding="utf-8"?> is not in line 1, which is afaik a violation of xml-syntax. I suspect, that the SAXBuilder fails to deduce a meaningful root object from the String.
I have confirmed this for 3.1.0 (from 3.x branch). Using .trim() before passing the xml-String will solve the convertion! I won't be able to spend time on the analysis of where to actually trim the string (to do it right) for a couple of days. If anyone solves this in the meantime, I'd appreciate it. Edit: Yep, trim fixes it for me.
@TreffnonX do you have an invalid-valid xml file from your mediawiki installation which demonstrates the problem in XmlConverter
?
Sadly, it's a company internal wiki. I cannot copy any information or give an example. What I can do is give you a pruned representation of the logged exception:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid XML:
<?xml version="1.0" encoding="utf-8"?><api><query><general mainpage="Hauptseite" base="http://wiki/mediawiki/index.php/Hauptseite" sitename="xxxxxxxx" generator="MediaWiki 1.12.0" case="first-letter" rights="" lang="de" /></query></api>
at net.sourceforge.jwbf.core.Optionals.getOrThrow(Optionals.java:29)
at net.sourceforge.jwbf.mapper.XmlConverter.getRootElementWithError(XmlConverter.java:70)
at net.sourceforge.jwbf.mediawiki.actions.meta.GetVersion.parse(GetVersion.java:72)
at net.sourceforge.jwbf.mediawiki.actions.meta.GetVersion.processAllReturningText(GetVersion.java:82)
at net.sourceforge.jwbf.mediawiki.actions.util.MWAction.processReturningText(MWAction.java:76)
at net.sourceforge.jwbf.core.actions.HttpActionClient.executeAndProcess(HttpActionClient.java:247)
at net.sourceforge.jwbf.core.actions.HttpActionClient.get(HttpActionClient.java:224)
at net.sourceforge.jwbf.core.actions.HttpActionClient.processAction(HttpActionClient.java:160)
at net.sourceforge.jwbf.core.actions.HttpActionClient.performAction(HttpActionClient.java:139)
at net.sourceforge.jwbf.core.bots.HttpBot.performAction(HttpBot.java:57)
at net.sourceforge.jwbf.mediawiki.bots.MediaWikiBot.performAction(MediaWikiBot.java:265)
at net.sourceforge.jwbf.mediawiki.bots.MediaWikiBot.getPerformedAction(MediaWikiBot.java:270)
at net.sourceforge.jwbf.mediawiki.bots.MediaWikiBot.getPerformedAction(MediaWikiBot.java:278)
at net.sourceforge.jwbf.mediawiki.bots.MediaWikiBot.getVersion(MediaWikiBot.java:299)
at net.sourceforge.jwbf.mediawiki.actions.queries.AllPageTitles.generateRequest(AllPageTitles.java:105)
at net.sourceforge.jwbf.mediawiki.actions.queries.AllPageTitles.prepareNextRequest(AllPageTitles.java:171)
at net.sourceforge.jwbf.mediawiki.actions.queries.BaseQuery.doCollection(BaseQuery.java:134)
at net.sourceforge.jwbf.mediawiki.actions.queries.BaseQuery.hasNext(BaseQuery.java:85)
at com.inni.rpwikicleanup.App.toFiles(App.java:94)
at com.inni.rpwikicleanup.App.main(App.java:86)
As is apparent, the newlines after the "Invalid XML: " are the problem. I solved the entire problem by inserting ".trim()" on 4 spots in the XmlConverter:
@Nonnull
public static XmlElement getRootElementWithError(final String xml)
{
return Optionals.getOrThrow(getRootElementWithErrorOpt(xml.trim()),
"Invalid XML: " + xml);
}
static Optional<XmlElement> getRootElementWithErrorOpt(String xml)
{
Optional<String> xmlStringOpt = Optionals.absentIfEmpty(xml.trim());
if (xmlStringOpt.isPresent())
{
SAXBuilder builder = new SAXBuilder();
org.jdom2.Element root;
try
{
Document doc = builder
.build(
new ByteArrayInputStream(xml.trim().getBytes(Charsets.UTF_8)));
root = doc.getRootElement();
}
catch (JDOMException e)
{
System.err.println(e.getMessage());
log.error(xml.trim().replaceAll("\\>\\<", ">\n<"));
return Optional.absent();
}
Apparently older mediawiki installations will return an untrimmed response to certain queries. This results in the exception as thrown above. It might be a simpler solution to xml = xml.trim();
before the first operation.
I remember that I've this problem some years ago by a customised template with leading newlines.
But I'll add an patch to repair invalid xml
I tend to close this issue, with change. Ok?
Sure. My issue is already solved. I just reported the issue, so it would not obstruct others ;)
Does not support Wikia. When using a page from a Wikia wiki, the Exception "java.lang.IllegalArgumentException: Invalid XML" occurs.