Invalid latex document if "= Heading =" is used (\chapter for documentclass article)

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Create wiki entry with = =, == ==, and === === headings
2. Try to export latex directly to pdf (or to tex and run latex manually)

What is the expected output? What do you see instead?
I expected a tex document which is compilable, but an "article" was created
which had "\chapter" entries which are invalid

What version of Wiki2LaTeX are you using? On which Mediawiki version?
latest version (0.10) of wiki2latex, MW 1.15

Original issue reported on code.google.com by philipp....@googlemail.com on 31 May 2010 at 12:52

GoogleCodeExporter commented 8 years ago

I'm using the same versions, and I can confirm this behavior.  Specifically:

Seems like in scrbook/scrreport mode:
= single equal = gets converted to    \part{single equal}
== double equal   gets converted to \chapter{double equal}

and in scarticle mode:

= single equal = gets converted to \chapter{single equal}
== double equal == gets converted to \section{double equal}

On the one hand, there seems to be some sort of problem because \chapter can be 
generated in article mode, which 
LaTeX doesn't allow.

On the other hand, I didn't even know that single equals we possible!   I use 
the Wikipedia approach of making == 
== the highest level.

http://en.wikipedia.org/wiki/Help:Wiki_markup#Layout

However, Mediawiki generally seems to allow single equals.

http://www.mediawiki.org/wiki/Help:Formatting

A simple fix would be to stop using = single equal = and make == double == your 
highest level.

Original comment by brian.ku...@gmail.com on 7 Jun 2010 at 8:47

GoogleCodeExporter commented 8 years ago

Bug avoidance is not a fix. Changing == == to the highest level in the whole 
wiki
would be a huge amount of work. Further it just looks ugly. The simplest fix 
would be
a config menu where you could set the relation. Unfortunately I couldn't find 
the
function in the code where == is set to \section.

Original comment by philipp....@googlemail.com on 7 Jun 2010 at 8:54

GoogleCodeExporter commented 8 years ago

The function is located here: 

http://code.google.com/p/wiki2latex/source/browse/trunk/w2lParser.php#812

The point is, that single equal headings are not part of the official mediawiki 
syntax (though it works). You can implement a simple function, which corrects 
the beahaviour for your wiki (using the 'w2lHeadings'-hook).

Maybe a smarter way could be found, which removes the assumption, that == == is 
the highest heading-level. This assumption is, what causes the bug. It should 
be fixed by checking, whether or not a single-equal-heading has been found in 
the doc. Adding another config-var seems unappropriate to me: I frequently 
forget to check some boxes when generating pdfs, which means I need to generate 
it again.

Original comment by hansgeorg.kluge@gmail.com on 9 Jun 2010 at 7:50

Added labels: Component-MwIntegration

GoogleCodeExporter commented 8 years ago

I think that Hans-Georg's "smarter way" make the most sense.

WARNING (1) I know little about how the Parser actually works (2) know little 
about regex or PHP and (3) know nothing about the w2lHeadings hook.   But the 
following code, added after line 821 in w2lParser.php, implements the 
suggestion, at least on my wiki:

                if ( preg_match("!^<h1>(.+)</h1>!", $str)) {
                        $str = preg_replace("!^<h5>(.+)</h5>\\s*$!m","<h6>\\1</h6>\\2", $str);
                        $str = preg_replace("!^<h4>(.+)</h4>\\s*$!m","<h5>\\1</h5>\\2", $str);
                        $str = preg_replace("!^<h3>(.+)</h3>\\s*$!m","<h4>\\1</h4>\\2", $str);
                        $str = preg_replace("!^<h2>(.+)</h2>\\s*$!m","<h3>\\1</h3>\\2", $str);
                        $str = preg_replace("!^<h1>(.+)</h1>\\s*$!m","<h2>\\1</h2>\\2", $str);
                }

http://code.google.com/p/wiki2latex/source/browse/trunk/w2lParser.php#812

w2lParse.php does Wiki -> HTML -> LaTeX conversion, interestingly.   The above 
modification works after HTML conversion.  If <h1> is detected, then <h(x)> is 
demoted to <h(x-1)>.   This way, <h1> never appears, and is closer to the 
behavior that was assumed (i.e. that "= single equals =" never appear) in the 
original implementation.   

This will fail if, for some reason, doHeadings gets called in multiple passes.  
It might detect <h1> in one pass, but not another, and the demotions would not 
be uniformly applied.

Use at your own risk, but it worked on my little test.

Original comment by brian.ku...@gmail.com on 14 Jun 2010 at 9:19

GoogleCodeExporter commented 8 years ago

Additional comments:

(1) That should be <h(x)> is demoted to <h(x+1)>

(2) My use of ! for the regex boundary was hurried -- I don't know what the 
usual alternative to / is .

Original comment by brian.ku...@gmail.com on 14 Jun 2010 at 9:51

GoogleCodeExporter commented 8 years ago

The parser uses - whereever possible - the original functions of Mediawiki's 
parser; so the way from Wiki -> HTML -> LaTeX is sometimes the best way.

Your regex seems fine. I will include it into W2L.

Thank you very much.

Original comment by hansgeorg.kluge@gmail.com on 18 Jun 2010 at 10:37

Changed state: Started

GoogleCodeExporter commented 8 years ago

I added this change. I used "/" as delimiter and masked the closing html. Hope 
this will work :)

Original comment by hansgeorg.kluge@gmail.com on 26 Aug 2010 at 9:08

Changed state: Fixed

aiyuyun2015 / wiki2latex

Invalid latex document if "= Heading =" is used (\chapter for documentclass article) #56