WikiModel should support wiki syntax in Link labels

GoogleCodeExporter commented 8 years ago

For example this should be allowed:

[[**bold**>reference]]
[[{{macro/}}>reference]]

This is pretty obvious since in lots of syntaxes (XHTML, Latex, etc) it's
possible to create links around any type of elements: images, formatted
text, etc.

Note: This is also supported by Doxia

Original issue reported on code.google.com by vmas...@gmail.com on 6 Oct 2008 at 8:27

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

I have written a parser for Moinmoin syntax, but the integration into
XWiki is quite an ugly situation due to the way link label parsing has
been implemented.  (Ironically, Moinmoin syntax does not support
formatting in link labels.)

So, I'm considering how to fix this issue cleanly:
http://code.google.com/p/wikimodel/issues/detail?id=87.  The thing
what's causing trouble are the syntaxes (such as XWiki) where the
label comes before the actual link address.

I can see two possibilites to change the IWemListener interface:

1) beginReference(WikiReference ref)/endReference()

   Here the parser itself must take care of the problem by skipping
   ahead to the address and then going back to parse the label.

2) beginReference()/endReference(WikiReference ref)

   Leave it to the listener to wait until the endReference-call until
   filling out the address.  Is probably very easy when building a
   DOM, but might be problematic for a streaming listener.

I think that 1) is a nicer solution.  However, the easiest and most
efficient way, I think, to solve this whith javacc is to backup the
input stream and reparse with a different scanner context.  But I'm a
bit concerned about the fact that the javacc documentation forbids
modifying the input stream:

https://javacc.dev.java.net/doc/tokenmanager.html

In the current implementation it seems to be safe to backup to the
beginning of the current token before switching scanner contexts.  (A
feature which greatly simplifies the Moinmoin grammar, by the way.)
So, it is also a question if we dare to rely on this.  The patch below
illustrates the idea.

What do you people think about this?  If you think that this an ok
solution, I'd be happy to go ahead and fix this for all syntaxes.

Index: javacc/XWikiScanner.jj
===================================================================
--- javacc/XWikiScanner.jj  (revision 431)
+++ javacc/XWikiScanner.jj  (working copy)
@@ -128,7 +128,7 @@
     | <INTERNAL_MACRO_CONTENT: <MACRO_CONTENT> > : MACRO_CONTEXT
 }

-<DEFAULT, INITIAL_CONTEXT> TOKEN:
+<DEFAULT, INITIAL_CONTEXT, REFERENCE_LABEL> TOKEN:
 {
       <#LI: (<SPACE>)* ( ("*")+ (":" | ";")* | ( "1" | "*" )+ "." (":" | ";")* |
(":" | ";")+ ) (<SPACE>) >
     | <#HEADER: (<SPACE>)* ("=")+ >
@@ -150,7 +150,8 @@
     | <#MACRO_END: "{{/" <MACRO_NAME> (<SPACE>)* "}}" > 
     | <#MACRO_CONTENT: ( <XWIKI_CHAR> | <SPACE> |<NEW_LINE> | <XWIKI_SPECIAL_SYMBOL> ) >
     | <#REFERENCE_IMAGE: "[[image:" ("~" ~[] | ~["]"] | "]" ~["]"] )* "]]" >
-    | <#REFERENCE:  ( "[[" ( <REFERENCE_IMAGE> | "~" ~[] | ~["]"] | "]" ~["]"] 
)*
"]]" ) >
+    | <#REFERENCE_CONTENT: "~" ~[] | ~["]",">"] | "]" ~["]"] | ">" ~[">"]>
+    | <#REFERENCE:  ( "[[" (<REFERENCE_CONTENT>|<REFERENCE_IMAGE>)* "]]" ) >
     | <#HORLINE: "---" ("-")+ >
     | <#PARAMS:   "(%" ( "~" ~[] | ~["%"] | ["%"] ~[")"] )* "%)" >
     | <#CELL: ( "|=" | "|" | "!=" | "!!" ) (<PARAMS>)? >
@@ -248,6 +249,37 @@
 // case when it's possible to have a block element without a NEW_LINE before and
that when the block element is
 // located at the start of the document.

+
+<INITIAL_CONTEXT, DEFAULT> TOKEN:
+{
+      <REFERENCE_WITH_LABEL_START: "[[" 
(<REFERENCE_CONTENT>|<REFERENCE_IMAGE>)*
">>" ( <REFERENCE_CONTENT> | ">>")* "]]"> 
+      { 
+          int matchedCharacters = image.length();
+          int referenceStart = image.indexOf(">>") + ">>".length();
+          String reference = image.substring(referenceStart, 
+                                             image.length()-"]]".length());
+
+          matchedToken.image = reference;
+
+          // Go back and parse the label
+
+          input_stream.backup(matchedCharacters-"[[".length());          
+      }  : REFERENCE_LABEL_CONTEXT
+}
+
+<REFERENCE_LABEL_CONTEXT> TOKEN:
+{
+      <REFERENCE_WITH_LABEL_END: ">>" ( <REFERENCE_CONTENT> | ">>")* "]]"> : 
DEFAULT
+    | <RL_FORMAT_SYMBOL : <FORMAT_SYMBOL> >
+    | <RL_IMAGE : <IMAGE> > 
+    | <RL_BR : <BR> > : 
+
+    // "Standard" tokens. They are the same for all wikis.
+    | <RL_NL: <NEW_LINE> >
+    | <RL_WORD : ( <XWIKI_CHAR> )+ >
+    | <RL_SPECIAL_SYMBOL : <XWIKI_SPECIAL_SYMBOL> >
+}
+
 <INITIAL_CONTEXT> TOKEN:
 {
 // <initial-context>
@@ -331,10 +363,10 @@
     Token getVERBATIM_START(): {Token t=null;}
{(t=<I_VERBATIM_START>|t=<D_VERBATIM_START>){return t;}}
     Token getMACRO_EMPTY(): {Token t=null;}
{(t=<I_MACRO_EMPTY>|t=<D_MACRO_EMPTY>){return t;}}
     Token getMACRO_START(): {Token t=null;}
{(t=<I_MACRO_START>|t=<D_MACRO_START>){return t;}}
-    Token getFORMAT_SYMBOL(): {Token t=null;}
{(t=<I_FORMAT_SYMBOL>|t=<D_FORMAT_SYMBOL>){return t;}}
-    Token getIMAGE(): {Token t=null;} {(t=<I_IMAGE>|t=<D_IMAGE>){return t;}}
+    Token getFORMAT_SYMBOL(): {Token t=null;}
{(t=<I_FORMAT_SYMBOL>|t=<D_FORMAT_SYMBOL>|t=<RL_FORMAT_SYMBOL>){return t;}}
+    Token getIMAGE(): {Token t=null;} 
{(t=<I_IMAGE>|t=<D_IMAGE>|t=<RL_IMAGE>){return
t;}}
     Token getATTACH(): {Token t=null;} {(t=<I_ATTACH>|t=<D_ATTACH>){return t;}}
-    Token getBR(): {Token t=null;} {(t=<I_BR>|t=<D_BR>){return t;}}
+    Token getBR(): {Token t=null;} {(t=<I_BR>|t=<D_BR>|t=<RL_BR>){return t;}}
     Token getBLOCK_PARAMS(): {Token t=null;}
{(t=<I_BLOCK_PARAMS>|t=<D_BLOCK_PARAMS>){return t;}}
     Token getINLINE_PARAMS(): {Token t=null;}
{(t=<I_INLINE_PARAMS>|t=<D_INLINE_PARAMS>){return t;}}
     Token getQUOT_LINE(): {Token t=null;} {(t=<I_QUOT_LINE>|t=<D_QUOT_LINE>){return t;}}
@@ -342,9 +374,9 @@
     Token getXWIKI_SPACE(): {Token t=null;}
{(t=<I_XWIKI_SPACE>|t=<D_XWIKI_SPACE>){return t;}}

     // "Standard" tokens. They are the same for all wikis.
-    Token getNL(): {Token t=null;} {(t=<I_NL>|t=<D_NL>){return t;}}
-    Token getWORD(): {Token t=null;} {(t=<I_WORD>|t=<D_WORD>){return t;}}
-    Token getSPECIAL_SYMBOL(): {Token t=null;}
{(t=<I_SPECIAL_SYMBOL>|t=<D_SPECIAL_SYMBOL>){return t;}}
+    Token getNL(): {Token t=null;} {(t=<I_NL>|t=<D_NL>|t=<RL_NL>){return t;}}
+    Token getWORD(): {Token t=null;} 
{(t=<I_WORD>|t=<D_WORD>|t=<RL_WORD>){return t;}}
+    Token getSPECIAL_SYMBOL(): {Token t=null;}
{(t=<I_SPECIAL_SYMBOL>|t=<D_SPECIAL_SYMBOL>|t=<RL_SPECIAL_SYMBOL>){return t;}}
 // </getters>

@@ -985,6 +1017,15 @@
     )+
 }

+void reference_with_label():
+{
+}
+{
+    <REFERENCE_WITH_LABEL_START> {/* fContext.beginReference(token.image); */ }
+     ( inline() )*
+    <REFERENCE_WITH_LABEL_END> { /* fContext.endReference(); */ }
+}
+
 // inline element
 void inline():
 {
@@ -1094,6 +1135,8 @@
             fContext.onReference(ref);
         }
         |
+        reference_with_label()
+        |
         t = getTABLE_CELL()
         {
             if (fContext.isInTable()) {

Original comment by AndreasZ...@gmail.com on 11 Jan 2010 at 9:30

GoogleCodeExporter commented 8 years ago

If think you forgot some thing in the REFERENCE_LABEL_CONTEXT:
- macros
- verbatim
- attach
and maybe other

The problem with what you do is that you have to make sure to synchronize two 
way of
parsing inline content (and link support any inline content except link).

Original comment by thomas.m...@gmail.com on 12 Jan 2010 at 10:06

GoogleCodeExporter commented 8 years ago


Also
( inline() )*
will never send "new line" events you have to do something like
(
  inline()
  |
  newLine()
)*
I think

Original comment by thomas.m...@gmail.com on 12 Jan 2010 at 10:07

GoogleCodeExporter commented 8 years ago

Thanks for the comments Thomas!  The patch is incomplete and I have not 
performed any
unit tests yet.  I just verified that the idea about scanning -> backup -> 
rescanning
worked.

I have found out that the danger of using the backup-method is when
the parser performs lookahead.  Thus, it is critical that the
reference_with_label production does not occur in a context where the
parser might use lookahead.  

But I notice that lookahead is used extensively in the XWiki grammar.
I don't see why this would be necessary and it should hurt the performance.

A quick fix is to let the scanner return the "[["-token separately
just as a buffer, which I think is sufficient as the inline()
production only seem to occur where the lookahead is 2.  But it would
probably be beneficial to try to reduce the use of lookahead.

Original comment by AndreasZ...@gmail.com on 12 Jan 2010 at 2:07

bryonjacob / wikimodel

WikiModel should support wiki syntax in Link labels #87