eXist-db / documentation

Documentation of eXist
12 stars 44 forks source link

An experimental Markdown edition #91

Closed joewiz closed 4 years ago

joewiz commented 7 years ago

One of the ideas kicking around for making eXist's documentation easier to contribute to and maintain is to turn it into Markdown. For recent discussion of this, see https://github.com/eXist-db/documentation/issues/34.

I've started a proof of concept at https://github.com/eXist-db/documentation/pull/90. You can clone my markdown branch at https://github.com/joewiz/documentation.git, and try browsing through the Markdown in your web browser at https://github.com/joewiz/documentation/blob/markdown/data/documentation.md. It's a very early step. It's just an experiment to help us think through whether this route seems useful to pursue further.

But besides actual code experiments, this issue is about the general topic of pros and cons of moving to Markdown. I'll start us off:

Pros:

Cons:

Ultimately, moving from DocBook to Markdown is just technological musical chairs. It doesn't magically improve the content by itself. That's where the real work is needed. The question is: Is this a better starting point from which to begin improving the content? And will it encourage enough contributions to be worth the effort?

adamretter commented 7 years ago

After creating the Versioning Scheme document, I would like to add some cons that I experienced:

1) No standard definition of Markdown. There are lots of different flavours. 2) Rendering Markdown is inconsistent between editing tools and/or stylesheet processors. 3) Visual Editing tools for Markdown are still terrible.

For (2) and (3) I tried out the following tools - Mou, MacDown, Visual Studio Code and GitHub.

Something else that occurred to me is that collaborative editing and commenting were desired when working on the Versioning Scheme document. I am not sure if this is possible yet, but we could consider switching from DocBook to DITA and asking EasyDita to grant a free license for editing the eXist-db documentation perhaps? That might be a nice dogfood'ing solution as EasyDita uses eXist-db.

caseydawsonjordan commented 7 years ago

We would be happy to provide this to the eXist-db community (easyDITA) if there is interest. Let us know if there is anything we can do to help. Cheers!

joewiz commented 7 years ago

@adamretter GitHub-Flavored Markdown (GFM) would be the pragmatic choice of flavor given the proposed benefits of the GitHub-based ecosystem for lowering barriers to contributing. I've used Marked (see its documentation on GFM support) and recently oXygen 18.1 for decent local previews of GFM. Pandoc excels at converting Markdown to other formats - PDF, Word, etc.

@caseydawsonjordan Wow, that's great, thanks for the offer to the community! I've given some thought to migrating eXist's documentation to DITA too (my experiments are here: exist-documentation-dita.zip.) From your perspective as someone who has surely seen projects migrate from other document-based formats like DocBook to topic-based DITA, I'd be really interested if you had any opinions/suggestions/assessments about what decisions would be made to make a bare-bones transition from the current documentation to DITA. My initial experiments (with oXygen's DITA support) yielded mixed results; I think oXygen's DITA Maps Manager is a great way to get an overview of the project, but my conversion to either plain old DITA or markdown-esque DITA didn't really and conversions from these formats to outputs like HTML or PDF were lacking. It felt to me like it would be a lot of work just to migrate to DITA, and wondered if it might not be a better idea to "adapt" the content to a newly re-thought, topic-based structure rather than blindly "converting" it as is. Maybe (1) convert to Markdown in order to begin working on the content, with a topic-based approach in mind, and then (2) evaluate converting from the resulting Markdown to DITA?

dizzzz commented 7 years ago

I need to learn about DITA :-)

duncdrum commented 7 years ago

from a user perspective I really like mbostocks's approach to documenting d3.js. Word and pdf support seem a red-herring, getting the code-base and documentation better aligned, and potentially version-specific are the big ones to me.

ljo commented 7 years ago

@joewiz Just a small sidenote, DocBook also uses topics allowing different paths through the documentation without duplication of contents. :)

caseydawsonjordan commented 7 years ago

Joe,

Like anything with XML there is a lot of ways to do a conversion from DocBook to DITA, the biggest thing is just creating a strategy for splitting content up into topics that makes sense for the community. I believe that there are some automated tools to do this, but we honestly don't do much conversion between DocBook and DITA.

Markdown certainly has the lowest barriers for contribution, however the issue is that markdown is very hard to convert to other formats if you need to in the future or if you need to produce different versions of the documentation (IE Quickstart guide, Security Guide, Full Guide, etc) while keeping everything in sync. So if you go down the markdown path, if you ever need to change to something else in the future, it may be a very painful process. With DITA or other XML formats, you can automate all those conversions or anything else you want. As well all know, XML is a powerful thing, be careful ditching it...

Also, I'll mention that DITA is a lot easier to search, especially in eXist because you have semantic XML to work with.

There are a lot more things to consider, but these are just a few that I will mention here. Markdown could be a great choice but it comes with some long term problems you should consider in depth before you make a decision on a format.

If you guys are having any upcoming telecons about this I would be happy to join to chat.

Cheers,

Casey

On Wed, Mar 8, 2017 at 11:18 AM ljo notifications@github.com wrote:

@joewiz https://github.com/joewiz Just a small sidenote, DocBook also uses topics allowing different paths through the documentation without duplication of contents. :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eXist-db/documentation/issues/91#issuecomment-285087378, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgl3a33t3ucGVgM7d8iwCfdPopfJVgwks5rjtS-gaJpZM4MPTdu .

duncdrum commented 7 years ago

@joewiz something i was playing with might interest you. You can see the output here

joewiz commented 7 years ago

@duncdrum Beautiful work! I especially like how you dropped anchors to each function, e.g., https://github.com/duncdrum/cbdb-data/blob/nuseir/doc/function-doc.md#alias and how there's so much interlinking and cross-referencing within the page. This looks really useful! Thank you for sharing it!

joewiz commented 7 years ago

In my "markdown" branch I've just committed some new features:

The biggest problem I see with the app is that 7 articles have parse failures. I've listed these in the commit message. I was able to find some symptoms but not the cause: it seems to have to do with fenced code block parsing. For example, when trimming all but the java example from https://github.com/joewiz/documentation/blob/markdown/data/triggers.md#java, the page would trigger the error; removing these java blocks would get rid of the error.

In addition, many fenced code blocks have trouble with syntax highlighting. Old (left) vs. new (right):

screen shot 2017-04-17 at 3 10 24 am screen shot 2017-04-17 at 3 13 47 am

I see two alternatives to this seemingly "showstopper" markdown parsing issue:

  1. Fix the markdown parser
  2. Ditch it for something like https://github.com/expkg-zone58/ex-markdown, based on https://github.com/vsch/flexmark-java

Thoughts?

duncdrum commented 7 years ago

@joewiz very nice. I m SIB but I ll have a look when I m up and running. The fenced codeblocks seem fixable with some white space trickery. As for the unique headings we could switch to alternating numeral + alphabetic. So ''4.2.1.4'' -> ''4b1d''. The dots in the heading number are prone to cause problems, but there are other workarounds.

duncdrum commented 7 years ago

@joewiz I m not sure the markdown parser is to blame. Looking through your commit history, the initial output from pandoc seems poorer then I would have expected. No wonder the parser tripped. Still switching to ex-markdown might be a smart move. What's the goal here? Find out why pandoc produced poor output (and submit PR's to them)? Change the docbook so the conversion becomes easier? or improve the markdown that is there now? P.S.: I ve never tried marked but atom's linter has proofed quite useful so far.

joewiz commented 7 years ago

@duncdrum I don't disagree about the quality of the pandoc output - the markdown needs work and I'm open to either approach of tweaking pandoc or reformatting what we have - but the parser does trip up on a perfectly valid fenced code block, excerpted from https://github.com/joewiz/documentation/blob/markdown/data/triggers.md#java lines 265-420 (notice that in that link github is able to parse the java example).

Here is a test showing the error: test-md-parsing.zip. Unzip the archive, upload test-md-parsing.xq and test.md to /db, and run the .xq, which tries to parse the .md file but returns this error:

failed to parse /db/test.md: java:org.w3c.dom.DOMException: err:XQDY0025: element has more than one attribute '+' ( :)

I should post this as a bug to https://github.com/wolfgangmm/exist-markdown, but there is the larger question: do we want to take on the task of "supporting" GitHub Flavored Markdown via an XQuery library when there is a Java library that does the task that could be made into an XQuery module? Also, an XQuery module for https://github.com/vsch/flexmark-java like https://github.com/expkg-zone58/ex-markdown would support more than just GFM but the CommonMark umbrella format (which GitHub has committed to; see https://githubengineering.com/a-formal-spec-for-github-markdown/). In addition, check out the CommonMark AST generated by FlexMark-Java - an XML representation of the source as CommonMark sees it. Wouldn't you rather work with this (the result of pasting test.md into http://spec.commonmark.org/dingus/ and selecting the AST tab) and work on transforming this rather than debugging errors like the above each time we trip up on some weird Markdown?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <heading level="1">
    <text>Java</text>
  </heading>
  <code_block info="java">
import java.io.File;
import java.io.FileOutputStream;

import org.exist.collections.triggers.FilteringTrigger;
import org.exist.collections.triggers.TriggerException;
import org.exist.dom.DocumentImpl;
import org.exist.storage.DBBroker;
import org.exist.storage.txn.Txn;
import org.exist.xmldb.XmldbURI;
import org.exist.xquery.value.DateTimeValue;

/**
    A simple Java Trigger that
    logs all trigger events for which it is executed
    in the file triggersLog.xml in the systems temporary folder
*/

public class LoggingTrigger extends FilteringTrigger implements DocumentTrigger
{
    private final static String TEMPLATE = &quot;&lt;?xml version=\&quot;1.0\&quot;?&gt;&lt;events&gt;&lt;/events&gt;&quot;;

    private DocumentImpl doc;

    public void configure(DBBroker broker, org.exist.collections.Collection parent, Map&lt;String, List&lt;?&gt;&gt; parameters) throws CollectionConfigurationException {
        super.configure(broker, parent, parameters);
        XmldbURI docPath = XmldbURI.create(&quot;messages.xml&quot;);
        System.out.println(&quot;TestTrigger prepares&quot;);
        this.doc = parent.getDocument(broker, docPath);
        if (this.doc == null) {
            TransactionManager transactMgr = broker.getBrokerPool().getTransactionManager();
            Txn transaction = transactMgr.beginTransaction();
            try {
                getLogger().debug(&quot;creating new file for collection contents&quot;);

                // IMPORTANT: temporarily disable triggers on the collection.
                // We would end up in infinite recursion if we don't do that
                parent.setTriggersEnabled(false);
                IndexInfo info = parent.validateXMLResource(transaction, broker, docPath, TEMPLATE);

                parent.store(transaction, broker, info, TEMPLATE, false);
                this.doc = info.getDocument();

                transactMgr.commit(transaction);
            } catch (Exception e) {
                transactMgr.abort(transaction);
                throw new CollectionConfigurationException(e.getMessage(), e);
            } finally {
                parent.setTriggersEnabled(true);
            }
        }
    }

    @Deprecated
    public void prepare(int event, DBBroker broker, Txn transaction, XmldbURI documentPath, DocumentImpl existingDocument) throws TriggerException {
    }

    @Deprecated
    public void finish(int event, DBBroker broker, Txn transaction, XmldbURI documentPath, DocumentImpl document) {
    }

    private void addRecord(DBBroker broker, String xupdate) throws TriggerException {
        MutableDocumentSet docs = new DefaultDocumentSet();
        docs.add(doc);
        try {
            // IMPORTANT: temporarily disable triggers on the collection.
            // We would end up in infinite recursion if we don't do that
            getCollection().setTriggersEnabled(false);
            // create the XUpdate processor
            XUpdateProcessor processor = new XUpdateProcessor(broker, docs, AccessContext.TRIGGER);
            // process the XUpdate
            Modification modifications[] = processor.parse(new InputSource(new StringReader(xupdate)));
            for (int i = 0; i &lt; modifications.length; i++)
                modifications[i].process(null);
            broker.flush();
        } catch (Exception e) {
            e.printStackTrace();
            throw new TriggerException(e.getMessage(), e);
        } finally {
            // IMPORTANT: reenable trigger processing for the collection.
            getCollection().setTriggersEnabled(true);
        }

    }

    @Override
    public void beforeCreateDocument(DBBroker broker, Txn transaction, XmldbURI uri) throws TriggerException {
        String xupdate = &quot;&lt;?xml version=\&quot;1.0\&quot;?&gt;&quot; +
        &quot;&lt;xu:modifications version=\&quot;1.0\&quot; xmlns:xu=\&quot;&quot; + XUpdateProcessor.XUPDATE_NS + &quot;\&quot;&gt;&quot; +
        &quot;   &lt;xu:append select='/events'&gt;&quot; +
        &quot;       &lt;xu:element name='event'&gt;&quot; +
        &quot;           &lt;xu:attribute name='id'&gt;STORE-DOCUMENT&lt;/xu:attribute&gt;&quot; +
        &quot;           &lt;xu:attribute name='collection'&gt;&quot; + doc.getCollection().getURI() + &quot;&lt;/xu:attribute&gt;&quot; +
        &quot;       &lt;/xu:element&gt;&quot; +
        &quot;   &lt;/xu:append&gt;&quot; +
        &quot;&lt;/xu:modifications&gt;&quot;;

        addRecord(broker, xupdate);
    }

    @Override
    public void afterCreateDocument(DBBroker broker, Txn transaction, DocumentImpl document) {
        //ignore this event
    }

    @Override
    public void beforeUpdateDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException {
        //ignore this event
    }

    @Override
    public void afterUpdateDocument(DBBroker broker, Txn transaction, DocumentImpl document) {
        //ignore this event
    }

    @Override
    public void beforeCopyDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException {
        //ignore this event
    }

    @Override
    public void afterCopyDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) {
        //ignore this event
    }

    @Override
    public void beforeMoveDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) throws TriggerException {
        //ignore this event
    }

    @Override
    public void afterMoveDocument(DBBroker broker, Txn transaction, DocumentImpl document, XmldbURI newUri) {
        //ignore this event
    }

    @Override
    public void beforeDeleteDocument(DBBroker broker, Txn transaction, DocumentImpl document) throws TriggerException {
        String xupdate = &quot;&lt;?xml version=\&quot;1.0\&quot;?&gt;&quot; +
        &quot;&lt;xu:modifications version=\&quot;1.0\&quot; xmlns:xu=\&quot;&quot; + XUpdateProcessor.XUPDATE_NS + &quot;\&quot;&gt;&quot; +
        &quot;   &lt;xu:append select='/events'&gt;&quot; +
        &quot;       &lt;xu:element name='event'&gt;&quot; +
        &quot;           &lt;xu:attribute name='id'&gt;REMOVE-DOCUMENT&lt;/xu:attribute&gt;&quot; +
        &quot;           &lt;xu:attribute name='collection'&gt;&quot; + doc.getCollection().getURI() + &quot;&lt;/xu:attribute&gt;&quot; +
        &quot;       &lt;/xu:element&gt;&quot; +
        &quot;   &lt;/xu:append&gt;&quot; +
        &quot;&lt;/xu:modifications&gt;&quot;;

        addRecord(broker, xupdate);
    }

    @Override
    public void afterDeleteDocument(DBBroker broker, Txn transaction, XmldbURI uri) {
    }
}
</code_block>
</document>
joewiz commented 7 years ago

Also, from the screenshot of the rendering of xquery.md above, compare the results of rendering this:

# XQuery 3.0 Support

eXist-db implements the following features of the ["XQuery 3.0"](http://www.w3.org/TR/xquery-30/) Working Draft

-   Higher Order Functions: eXist-db completely supports higher-order functions, including features like inline functions, closures, and partial function application. For more information, see the article on the eXist-db blog, [Higher-Order Functions in XQuery 3.0](http://atomic.exist-db.org/blogs/eXist/HoF)

-   Group by clause in FLWOR expressions: "group by" provides an efficient way to group the sequences generated in a FLWOR expression. For example,

    ``` xquery
    xquery version "3.0";
    for $speechBySpeaker in //SPEECH[ft:query(., "king")]
    group by $speaker := $speechBySpeaker/SPEAKER
    order by $speaker
    return
        <speaker name="{$speaker}">
        { $speechBySpeaker }
        </speaker>
queries the Shakespeare plays and groups the result by speaker.

**exist-markdown**

> <img width="599" alt="screen shot 2017-04-20 at 12 20 46 pm" src="https://cloud.githubusercontent.com/assets/59118/25241396/d1d97526-25c3-11e7-82b4-924ab9f68aff.png">

```xml
<body>
    <section>
        <h1 id="xquery-3-0-support">XQuery 3.0 Support</h1>
        <p>eXist-db implements the following features of the <a href="http://www.w3.org/TR/xquery-30/">"XQuery 3.0"</a> Working Draft</p>
        <ul>
            <li>Higher Order Functions: eXist-db completely supports higher-order functions, including features like inline functions, closures, and partial function application. For more information, see the article on the eXist-db blog, <a href="http://atomic.exist-db.org/blogs/eXist/HoF">Higher-Order Functions in XQuery 3.0</a>
            </li>
            <li>Group by clause in FLWOR expressions: "group by" provides an efficient way to group the sequences generated in a FLWOR expression. For example,</li>
        </ul>
        <p>``` xquery xquery version "3.0"; for $speechBySpeaker in //SPEECH[ft:query(., "king")] group by $speaker := $speechBySpeaker/SPEAKER order by $speaker return &lt;speaker name="<span itemprop="$speaker">$speaker</span>"&gt; <span itemprop=" $speechBySpeaker "> $speechBySpeaker </span> &lt;/speaker&gt; ```</p>
        <p>queries the Shakespeare plays and groups the result by speaker.</p>
    </section>
</body>

commonmark ast

screen shot 2017-04-20 at 12 21 47 pm
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <heading level="1">
    <text>XQuery 3.0 Support</text>
  </heading>
  <paragraph>
    <text>eXist-db implements the following features of the </text>
    <link destination="http://www.w3.org/TR/xquery-30/" title="">
      <text>&quot;</text>
      <text>XQuery 3.0</text>
      <text>&quot;</text>
    </link>
    <text> Working Draft</text>
  </paragraph>
  <list type="bullet" tight="false">
    <item>
      <paragraph>
        <text>Higher Order Functions: eXist-db completely supports higher-order functions, including features like inline functions, closures, and partial function application. For more information, see the article on the eXist-db blog, </text>
        <link destination="http://atomic.exist-db.org/blogs/eXist/HoF" title="">
          <text>Higher-Order Functions in XQuery 3.0</text>
        </link>
      </paragraph>
    </item>
    <item>
      <paragraph>
        <text>Group by clause in FLWOR expressions: </text>
        <text>&quot;</text>
        <text>group by</text>
        <text>&quot;</text>
        <text> provides an efficient way to group the sequences generated in a FLWOR expression. For example,</text>
      </paragraph>
      <code_block info="xquery">xquery version &quot;3.0&quot;;
for $speechBySpeaker in //SPEECH[ft:query(., &quot;king&quot;)]
group by $speaker := $speechBySpeaker/SPEAKER
order by $speaker
return
    &lt;speaker name=&quot;{$speaker}&quot;&gt;
    { $speechBySpeaker }
    &lt;/speaker&gt;
</code_block>
      <paragraph>
        <text>queries the Shakespeare plays and groups the result by speaker.</text>
      </paragraph>
    </item>
  </list>
</document>
duncdrum commented 7 years ago

yes I see, the indenting (and sha 2dc79579b55f8de99c25d230e543b97ecc83cece) is really pushing things here on GH and also in the repo, lots of code blocks starting with like 8 whitespaces on the first line.

Just a note according to this there shouldn't be a white space behind the code block ticks.

here is my 2ct:

joewiz commented 7 years ago

there shouldn't be a white space behind the code block ticks

Sorry, let me make sure I understand. Are you saying there shouldn't be any spaces before the triple-backtick starting a fenced code block, as in the case above where the XQuery is indented inside the list? I think the spec link you included speaks to this though:

Fences can be indented. If the opening fence is indented, content lines will have equivalent opening indentation removed, if present... Four spaces indentation produces an indented code block:

....```
....aaa
....```

produces

<pre><code>```
aaa
```
</code></pre>
duncdrum commented 7 years ago

No I meant the infostring after the triple-ticks comes straight after the third tick. No ws.

The comment about 8 ws was about some files in the repo having codeblocks like this.

                    <root>
<a>blah<\a>
<\root>
joewiz commented 7 years ago

Ah, okay, I see. (I read "behind" as meaning "before", but I can see how you meant "behind" as "following"! Ah, spatial metaphors in language.)

I don't think whitespace following the code block ticks is a problem, as the CommonMark spec about fenced code blocks says this:

The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing spaces and called the info string.

So any leading or trailing spaces in the text following the opening code fence is trimmed. So it isn't wrong to include whitespace - it'll just be trimmed off by the commonmark processor.

That said, this would certainly be something I'd clean up if we were fixing the pandoc output.

joewiz commented 6 years ago

For yet another option, we might consider Readthedocs (https://readthedocs.org/), used by many projects such as Mongodb (see live version at https://docs.mongodb.com/manual/ and source at https://github.com/mongodb/docs/blob/master/source/index.txt).

Benefits of readthedocs:

(I think better versioning support would be great - so eXist users know which version of eXist the docs are targeting.)

Pandoc can convert our Docbook to reStructuredText (the preferred format of Readthedocs) just as easily as it can output Markdown. Perhaps we should dip our toes in with a test, using Adam's much cleaned-up source in https://github.com/eXist-db/documentation/pull/135?

Just another idea in advance of our discussion about documentation on Monday (https://docs.google.com/document/d/1aKBHnrYUQnMy2l2b8WUNg1SoN8nN3iLrgRKfggAiIQI/edit?usp=sharing).

adamretter commented 6 years ago

@joewiz If we are moving away from XML for our documentation, then I think this is an excellent idea. The documentation attempts at the past always focused on the technology and not the content and git nowhere. If we want to use reStructuredText to enable more authoring of content, than I support that. For delivery, I would like to see us write no code that we have to support, so I would welcome something like readthedocs.

joewiz commented 6 years ago

In preparation for my agenda item under the Documentation discussion for Monday's Community Call, I've created a new experimental edition of the eXist documentation, this time as a GitHub Wiki. Check it out here: https://github.com/joewiz/exist/wiki.

Pros of GitHub Wiki:

Cons:

Notes on my experimental version:

Comments, questions, concerns welcome - either here and/or on Monday's call.

JoernT commented 6 years ago

@joewiz wow, i like that approach very much. And like to throw in my point of view:

i'm very much for using something like the Github Wiki, Markdown or reStructed for their simplicity. The most prominent goal IMO must be to ease the process of contributing to the docs. If i don't need to fiddle around with new tools and languages that will very much increase my motivation to contribute content.

That's of course a pragmatic decision - if we'd use the Github Wiki we would even have the option of adding other documents (e.g. pdf or so) for content with higher layout needs if that'll ever occur.

For me the most important poiints are:

Because of this last point we here internally once discussed an eXist-db app that pulls the repo, converts to some sensible XML and indexes it for a better search interface. The results can even link back to the repo/wiki page. This essentially means to use Github for editing and versioning/workflow and a search frontend and rendering app.

Don't know how much room there is but if the Wiki produces HTML then additional CSS styling could help a lot to make more attractive and readable.

duncdrum commented 4 years ago

Since there are no plans in the foreseeable future to move to markdown, i m going to close this. Thanks for the discussion everyone, we might reopen this if we ever fiddle around with it again.