elastic / elasticsearch-parent

Elasticsearch Parent POM
Apache License 2.0
3 stars 10 forks source link

Add asciidoctor-maven-plugin 1.5.2 #50

Closed dadoonet closed 9 years ago

dadoonet commented 9 years ago

We can define asciidoctor-maven-plugin in case some projects want to use it.

            <plugin>
                <groupId>org.asciidoctor</groupId>
                <artifactId>asciidoctor-maven-plugin</artifactId>
                <version>1.5.2</version>
            </plugin>

Here is a sample configuration:

            <plugin>
                <groupId>org.asciidoctor</groupId>
                <artifactId>asciidoctor-maven-plugin</artifactId>
                <version>1.5.2</version>
                <executions>
                    <execution>
                        <id>output-asciidoc</id>
                        <phase>generate-resources</phase> 
                        <goals>
                            <goal>process-asciidoc</goal> 
                        </goals>
                    </execution>
                    <execution>
                        <id>output-html</id>
                        <phase>generate-resources</phase> 
                        <goals>
                            <goal>process-asciidoc</goal> 
                        </goals>
                        <configuration>
                            <sourceHighlighter>coderay</sourceHighlighter>
                            <backend>html</backend>
                            <attributes>
                                <toc/>
                                <linkcss>false</linkcss>
                            </attributes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

If needed, we can also first filter the documentation and replace all place holders by their maven values:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-resources-plugin</artifactId>
    <executions>
      <execution>
        <id>copy-resources</id>
        <phase>validate</phase>
        <goals>
          <goal>copy-resources</goal>
        </goals>
        <configuration>
          <outputDirectory>${project.build.directory}/docs</outputDirectory>
          <resources>          
            <resource>
              <directory>src/main/docs</directory>
              <filtering>true</filtering>
            </resource>
          </resources>              
        </configuration>            
      </execution>
    </executions>
</plugin>
<plugin>
    <groupId>org.asciidoctor</groupId>
    <artifactId>asciidoctor-maven-plugin</artifactId>
    <version>1.5.2</version>
    <executions>
        <execution>
            <id>output-asciidoc</id>
            <phase>generate-resources</phase> 
            <goals>
                <goal>process-asciidoc</goal> 
            </goals>
            <configuration>
                <sourceDirectory>${project.build.directory}/docs</sourceDirectory>
            </configuration>
        </execution>
        <execution>
            <id>output-html</id>
            <phase>generate-resources</phase> 
            <goals>
                <goal>process-asciidoc</goal> 
            </goals>
            <configuration>
                <sourceDirectory>${project.build.directory}/docs</sourceDirectory>
                <sourceHighlighter>coderay</sourceHighlighter>
                <backend>html</backend>
                <attributes>
                    <toc/>
                    <linkcss>false</linkcss>
                </attributes>
            </configuration>
        </execution>
    </executions>
</plugin>

In the above example, plugin documentation source is in src/main/docs

clintongormley commented 9 years ago

has somebody asked for asciidoctor? I'm cautious about it, because it is different from asciidoc so relying on asciidoctor output will not help help with our docs build process.

dadoonet commented 9 years ago

@clintongormley Was trying to have a way to generate plugin documentation.

The test I ran so far produces something like:

<?xml version="1.0" encoding="UTF-8"?>
<?asciidoc-toc?>
<?asciidoc-numbered?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en">
<info>
<title>Japanese (kuromoji) Analysis for Elasticsearch</title>
<date>2015-06-02</date>
</info>
<simpara>The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.</simpara>
<simpara>In order to install the plugin, run:</simpara>
<programlisting language="shell" linenumbering="unnumbered">bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0</programlisting>
<simpara>To build a <literal>SNAPSHOT</literal> version, you need to build it with Maven:</simpara>
<programlisting language="shell" linenumbering="unnumbered">cd plugins/analysis-kuromoji
mvn clean install
plugin --install analysis-kuromoji \
       --url file:target/releases/elasticsearch-analysis-kuromoji-3.0.0-SNAPSHOT.zip</programlisting>
<analysis-kuromoji-content xml:id="_provided_analysis_features">
<title>Provided analysis features</title>
<table frame="all" rowsep="1" colsep="1">
<title>Available analyzer, tokenizer, tokenfilter and charfilter</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="2">
<colspec colname="col_1" colwidth="302*"/>
<colspec colname="col_2" colwidth="119*"/>
<thead>
<row>
<entry align="left" valign="top">Name</entry>
<entry align="center" valign="top">Type</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>analyzer</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_tokenizer</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenizer</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_baseform</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenfilter</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_part_of_speech</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenfilter</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_readingform</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenfilter</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_stemmer</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenfilter</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>ja_stop</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>tokenfilter</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>kuromoji_iteration_mark</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>charfilter</literal></simpara></entry>
</row>
</tbody>
</tgroup>
</table>
</analysis-kuromoji-content>
<analysis-kuromoji-usage xml:id="_usage">
<title>Usage</title>
<section xml:id="_analyzer_kuromoji">
<title>Analyzer : kuromoji</title>
<simpara>An analyzer of type <literal>kuromoji</literal>.
This analyzer is the following tokenizer and tokenfilter combination.</simpara>
<itemizedlist>
<listitem>
<simpara><literal>kuromoji_tokenizer</literal> : Kuromoji Tokenizer</simpara>
</listitem>
<listitem>
<simpara><literal>kuromoji_baseform</literal> : Kuromoji BasicFormFilter (TokenFilter)</simpara>
</listitem>
<listitem>
<simpara><literal>kuromoji_part_of_speech</literal> : Kuromoji Part of Speech Stop Filter (TokenFilter)</simpara>
</listitem>
<listitem>
<simpara><literal>cjk_width</literal> : CJK Width Filter (TokenFilter)</simpara>
</listitem>
<listitem>
<simpara><literal>stop</literal> : Stop Filter (TokenFilter)</simpara>
</listitem>
<listitem>
<simpara><literal>kuromoji_stemmer</literal> : Kuromoji Katakana Stemmer Filter (TokenFilter)</simpara>
</listitem>
<listitem>
<simpara><literal>lowercase</literal> : LowerCase Filter (TokenFilter)</simpara>
</listitem>
</itemizedlist>
</section>
<section xml:id="_charfilter_kuromoji_iteration_mark">
<title>CharFilter : kuromoji_iteration_mark</title>
<simpara>A charfilter of type <literal>kuromoji_iteration_mark</literal>.
This charfilter is Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.</simpara>
<simpara>The following ar setting that can be set for a <literal>kuromoji_iteration_mark</literal> charfilter type:</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Available settings</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="72*"/>
<colspec colname="col_2" colwidth="246*"/>
<colspec colname="col_3" colwidth="98*"/>
<thead>
<row>
<entry align="left" valign="top">Setting</entry>
<entry align="left" valign="top">Description</entry>
<entry align="center" valign="top">Default value</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>normalize_kanji</literal></simpara></entry>
<entry align="left" valign="top"><simpara>indicates whether kanji iteration marks should be normalized</simpara></entry>
<entry align="center" valign="top"><simpara><literal>true</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>normalize_kana</literal></simpara></entry>
<entry align="left" valign="top"><simpara>indicates whether kanji iteration marks should be normalized</simpara></entry>
<entry align="center" valign="top"><simpara><literal>true</literal></simpara></entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section xml:id="_tokenizer_kuromoji_tokenizer">
<title>Tokenizer : kuromoji_tokenizer</title>
<simpara>A tokenizer of type <literal>kuromoji_tokenizer</literal>.</simpara>
<simpara>The following are settings that can be set for a <literal>kuromoji_tokenizer</literal> tokenizer type:</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Available settings</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="72*"/>
<colspec colname="col_2" colwidth="246*"/>
<colspec colname="col_3" colwidth="98*"/>
<thead>
<row>
<entry align="left" valign="top">Setting</entry>
<entry align="left" valign="top">Description</entry>
<entry align="center" valign="top">Default value</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>mode</literal></simpara></entry>
<entry align="left" valign="top"><simpara>Tokenization mode: this determines how the tokenizer handles compound and unknown words. <literal>normal</literal> and <literal>search</literal>, <literal>extended</literal></simpara></entry>
<entry align="center" valign="top"><simpara><literal>search</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>discard_punctuation</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>true</literal> if punctuation tokens should be dropped from the output.</simpara></entry>
<entry align="center" valign="top"><simpara><literal>true</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>user_dictionary</literal></simpara></entry>
<entry align="left" valign="top"><simpara>set User Dictionary file</simpara></entry>
<entry align="center" valign="top"></entry>
</row>
</tbody>
</tgroup>
</table>
<section xml:id="_tokenization_mode">
<title>Tokenization mode</title>
<simpara>The mode is three types.</simpara>
<itemizedlist>
<listitem>
<simpara><literal>normal</literal> : Ordinary segmentation: no decomposition for compounds</simpara>
</listitem>
<listitem>
<simpara><literal>search</literal> : Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.</simpara>
</listitem>
<listitem>
<simpara><literal>extended</literal> : Extended mode outputs unigrams for unknown words.</simpara>
</listitem>
</itemizedlist>
<simpara>Difference tokenization mode outputs:</simpara>
<simpara>Input text is <literal>関西国際空港</literal> and <literal>アブラカダブラ</literal>.</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Tokenization mode examples</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="85*"/>
<colspec colname="col_2" colwidth="170*"/>
<colspec colname="col_3" colwidth="170*"/>
<thead>
<row>
<entry align="left" valign="top">mode</entry>
<entry align="left" valign="top"><literal>関西国際空港</literal></entry>
<entry align="left" valign="top"><literal>アブラカダブラ</literal></entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>normal</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>関西国際空港</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>アブラカダブラ</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>search</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>関西</literal> <literal>関西国際空港</literal> <literal>国際</literal> <literal>空港</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>アブラカダブラ</literal></simpara></entry>
</row>
<row>
<entry align="left" valign="top"><simpara><literal>extended</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>関西</literal> <literal>国際</literal> <literal>空港</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>ア</literal> <literal>ブ</literal> <literal>ラ</literal> <literal>カ</literal> <literal>ダ</literal> <literal>ブ</literal> <literal>ラ</literal></simpara></entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section xml:id="_user_dictionary">
<title>User Dictionary</title>
<simpara>Kuromoji tokenizer use MeCab-IPADIC dictionary by default.
And Kuromoji is added an entry of dictionary to define by user; this is User Dictionary.
User Dictionary entries are defined using the following CSV format:</simpara>
<screen>&lt;text&gt;,&lt;token 1&gt; ... &lt;token n&gt;,&lt;reading 1&gt; ... &lt;reading n&gt;,&lt;part-of-speech tag&gt;</screen>
<simpara>Dictionary Example:</simpara>
<screen>東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞</screen>
<simpara>To use User Dictionary set file path to <literal>user_dict</literal> attribute.
User Dictionary file is placed <literal>ES_HOME/config</literal> directory.</simpara>
</section>
<section xml:id="_example">
<title>Example</title>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "tokenizer" : {
                    "kuromoji_user_dict" : {
                       "type" : "kuromoji_tokenizer",
                       "mode" : "extended",
                       "discard_punctuation" : "false",
                       "user_dictionary" : "userdict_ja.txt"
                    }
                },
                "analyzer" : {
                    "my_analyzer" : {
                        "type" : "custom",
                        "tokenizer" : "kuromoji_user_dict"
                    }
                }

            }
        }
    }
}
'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&amp;pretty' -d '東京スカイツリー'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "東京",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "スカイツリー",
    "start_offset" : 2,
    "end_offset" : 8,
    "type" : "word",
    "position" : 2
  } ]
}</programlisting>
</section>
</section>
<section xml:id="_tokenfilter_kuromoji_baseform">
<title>TokenFilter : kuromoji_baseform</title>
<simpara>A token filter of type <literal>kuromoji_baseform</literal> that replaces term text with BaseFormAttribute.
This acts as a lemmatizer for verbs and adjectives.</simpara>
<simpara>Example:</simpara>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["kuromoji_baseform"]
                    }
                }
            }
        }
    }
}
'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&amp;pretty' -d '飲み'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "飲む",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}</programlisting>
</section>
<section xml:id="_tokenfilter_kuromoji_part_of_speech">
<title>TokenFilter : kuromoji_part_of_speech</title>
<simpara>A token filter of type <literal>kuromoji_part_of_speech</literal> that removes tokens that match a set of part-of-speech tags.</simpara>
<simpara>The following are settings that can be set for a stop token filter type:</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Available settings</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="72*"/>
<colspec colname="col_2" colwidth="246*"/>
<colspec colname="col_3" colwidth="98*"/>
<thead>
<row>
<entry align="left" valign="top">Setting</entry>
<entry align="left" valign="top">Description</entry>
<entry align="center" valign="top">Default value</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>stoptags</literal></simpara></entry>
<entry align="left" valign="top"><simpara>A list of part-of-speech tags that should be removed</simpara></entry>
<entry align="center" valign="top"></entry>
</row>
</tbody>
</tgroup>
</table>
<simpara>Note that default setting is stoptags.txt include lucene-analyzer-kuromoji.jar.</simpara>
<simpara>Example</simpara>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["my_posfilter"]
                    }
                },
                "filter" : {
                    "my_posfilter" : {
                        "type" : "kuromoji_part_of_speech",
                        "stoptags" : [
                            "助詞-格助詞-一般",
                            "助詞-終助詞"
                        ]
                    }
                }
            }
        }
    }
}
'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&amp;pretty' -d '寿司がおいしいね'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "寿司",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "おいしい",
    "start_offset" : 3,
    "end_offset" : 7,
    "type" : "word",
    "position" : 3
  } ]
}</programlisting>
</section>
<section xml:id="_tokenfilter_readinkuromoji_readingformgform">
<title>TokenFilter : readinkuromoji_readingformgform</title>
<simpara>A token filter of type <literal>kuromoji_readingform</literal> that replaces the term attribute with the reading of a token in either katakana or romaji form.
The default reading form is katakana.</simpara>
<simpara>The following are settings that can be set for a <literal>kuromoji_readingform</literal> token filter type:</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Available settings</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="72*"/>
<colspec colname="col_2" colwidth="246*"/>
<colspec colname="col_3" colwidth="98*"/>
<thead>
<row>
<entry align="left" valign="top">Setting</entry>
<entry align="left" valign="top">Description</entry>
<entry align="center" valign="top">Default value</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>use_romaji</literal></simpara></entry>
<entry align="left" valign="top"><simpara><literal>true</literal> if romaji reading form output instead of katakana.</simpara></entry>
<entry align="center" valign="top"><simpara><literal>false</literal></simpara></entry>
</row>
</tbody>
</tgroup>
</table>
<simpara>Note that elasticsearch-analysis-kuromoji built-in <literal>kuromoji_readingform</literal> set default <literal>true</literal> to <literal>use_romaji</literal> attribute.</simpara>
<simpara>Example</simpara>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "romaji_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["romaji_readingform"]
                    },
                    "katakana_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["katakana_readingform"]
                    }
                },
                "filter" : {
                    "romaji_readingform" : {
                        "type" : "kuromoji_readingform",
                        "use_romaji" : true
                    },
                    "katakana_readingform" : {
                        "type" : "kuromoji_readingform",
                        "use_romaji" : false
                    }
                }
            }
        }
    }
}
'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&amp;pretty' -d '寿司'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "スシ",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=romaji_analyzer&amp;pretty' -d '寿司'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "sushi",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "word",
    "position" : 1
  } ]
}</programlisting>
</section>
<section xml:id="_tokenfilter_kuromoji_stemmer">
<title>TokenFilter : kuromoji_stemmer</title>
<simpara>A token filter of type <literal>kuromoji_stemmer</literal> that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).
Only katakana words longer than a minimum length are stemmed (default is four).</simpara>
<simpara>Note that only full-width katakana characters are supported.</simpara>
<simpara>The following are settings that can be set for a <literal>kuromoji_stemmer</literal> token filter type:</simpara>
<table frame="all" rowsep="1" colsep="1">
<title>Available settings</title>
<?dbhtml table-width="100%"?>
<?dbfo table-width="100%"?>
<?dblatex table-width="100%"?>
<tgroup cols="3">
<colspec colname="col_1" colwidth="72*"/>
<colspec colname="col_2" colwidth="246*"/>
<colspec colname="col_3" colwidth="98*"/>
<thead>
<row>
<entry align="left" valign="top">Setting</entry>
<entry align="left" valign="top">Description</entry>
<entry align="center" valign="top">Default value</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left" valign="top"><simpara><literal>minimum_length</literal></simpara></entry>
<entry align="left" valign="top"><simpara>The minimum length to stem</simpara></entry>
<entry align="center" valign="top"><simpara><literal>4</literal></simpara></entry>
</row>
</tbody>
</tgroup>
</table>
<simpara>Example</simpara>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "my_analyzer" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["my_katakana_stemmer"]
                    }
                },
                "filter" : {
                    "my_katakana_stemmer" : {
                        "type" : "kuromoji_stemmer",
                        "minimum_length" : 4
                    }
                }
            }
        }
    }
}
'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&amp;pretty' -d 'コピー'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "コピー",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "word",
    "position" : 1
  } ]
}</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=my_analyzer&amp;pretty' -d 'サーバー'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "サーバ",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  } ]
}</programlisting>
</section>
<section xml:id="_tokenfilter_ja_stop">
<title>TokenFilter : ja_stop</title>
<simpara>A token filter of type <literal>ja_stop</literal> that provide a predefined "<emphasis>japanese</emphasis>" stop words.
<emphasis role="strong">Note: It is only provide "<emphasis>japanese</emphasis>". If you want to use other predefined stop words, you can use <literal>stop</literal> token filter.</emphasis></simpara>
<simpara>Example:</simpara>
<simpara><emphasis>Example Settings:</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPUT 'http://localhost:9200/kuromoji_sample/' -d'
{
    "settings": {
        "index":{
            "analysis":{
                "analyzer" : {
                    "analyzer_with_ja_stop" : {
                        "tokenizer" : "kuromoji_tokenizer",
                        "filter" : ["ja_stop"]
                    }
                },
                "filter" : {
                    "ja_stop" : {
                        "type" : "ja_stop",
                        "stopwords" : ["_japanese_", "ストップ"]
                    }
                }
            }
        }
    }
}'</programlisting>
<simpara><emphasis>Example Request using <literal>_analyze</literal> API :</emphasis></simpara>
<programlisting language="shell" linenumbering="unnumbered">curl -XPOST 'http://localhost:9200/kuromoji_sample/_analyze?analyzer=katakana_analyzer&amp;pretty' -d 'ストップは消える'</programlisting>
<simpara><emphasis>Response :</emphasis></simpara>
<programlisting language="js" linenumbering="unnumbered">{
  "tokens" : [ {
    "token" : "消える",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "word",
    "position" : 3
  } ]
}</programlisting>
</section>
</analysis-kuromoji-usage>
</article>

Would this be easily added to our documentation build process? Or is it useless?

clintongormley commented 9 years ago

Not useful. Without rewriting the docs build process to use asciidoctor instead, it wouldn't integrate. If we're moving the plugins into the core repo, then it becomes easy to include them in our build process