# IBSurgeon Full Text Search UDR
IBSurgeon FTS UDR library implements full text search procedures and functions for Firebird SQL, to run them in SQL queries, using the powerful capabilities of Lucene search engine. In this UDR we use Lucene++, it is the C++ implementation of Lucene search engine, to achieve the fastest search with true full text search capabilities. This UDR is 100% free and open source, with LGPL license.
Windows and Linux versions are available: for Windows we have ready-to-use binaries and for Linux it is necessary to build UDR to work on the specific distribution (we have easy building instruction).
The library is developed by the grant from IBSurgeon www.ib-aid.com.
To install Lucene UDR, you need:
plugins/udr
directoryYou can download ready-made builds for Windows OS using the links:
Under Linux, you can compile the library yourself.
Download the demo database, for which the examples are prepared, using the following links:
Documentation in English and Russian is available at the links:
Lucene UDR is based on Lucene++.
In some Linux distributions, you can install lucene++
and lucene++-contrib
from their repositories.
If there are no libraries in the repositories, then you will need to download and build them from the source.
git clone https://github.com/luceneplusplus/LucenePlusPlus.git
cd LucenePlusPlus
mkdir build; cd build
cmake ..
make
sudo cmake --install . --prefix /usr
The building of the lucene++ library is described in more detail in BUILDING.md.
Now you can start building UDR Lucene.
git clone https://github.com/IBSurgeon/lucene_udr.git
cd lucene_udr
mkdir build
cmake -S . -B ./build
cmake --build ./build -- -j12
sudo cmake --install ./build
In the process of executing cmake ..
the following error may occur
CMake Error at /usr/lib64/cmake/liblucene++/liblucene++Config.cmake:41 (message):
File or directory /usr/lib64/usr/include/lucene++/ referenced by variable
liblucene++_INCLUDE_DIRS does not exist !
Call Stack (most recent call first):
/usr/lib64/cmake/liblucene++/liblucene++Config.cmake:47 (set_and_check)
CMakeLists.txt:78 (find_package)
To fix it, you need to fix the files liblucene++Config.cmake
and liblucene++-contrib Config.cmake
, where
to replace the line
get_filename_component(PACKAGE_PREFIX_DIR "${CMAKE_CURRENT_LIST_DIR}/../../usr" ABSOLUTE)
with
get_filename_component(PACKAGE_PREFIX_DIR "${CMAKE_CURRENT_LIST_DIR}/../../.." ABSOLUTE)
Before using full-text search in your database, you need to make a preliminary configuration.
The Lucene UDR settings are in the file $(root)\fts.ini
. If this file does not exist, then create it yourself.
Where $(root)
is the root directory of the Firebird installation.
This file specifies the path to the directory where full-text indexes for the specified database will be created.
The full path to the database or alias must be set as the section name of the ini file
(depending on the value of the DatabaseAccess
parameter in firebird.conf
).
The path to the full-text index directory is specified in the ftsDirectory
key.
[fts_demo]
ftsDirectory=f:\fbdata\3.0\fts\fts_demo
or
[f:\fbdata\3.0\fts_demo.fdb]
ftsDirectory=f:\fbdata\3.0\fts\fts_demo
On Linux, the section name is case-sensitive. It must exactly match the value of the query result:
select mon$attachment_name
from mon$attachments
where mon$attachment_id = current_connection;
If your connection can occur both through an alias and with the path to the database, you can write both sections to the ini file at once.
[f:\fbdata\3.0\fts_demo.fdb]
ftsDirectory=f:\fbdata\3.0\fts\fts_demo
[fts_demo]
ftsDirectory=f:\fbdata\3.0\fts\fts_demo
Important: The user or group under which the Firebird service is running must have read and write permissions for the directory with full-text indexes.
You can get the directory location for full-text indexes using a query:
SELECT FTS$MANAGEMENT.FTS$GET_DIRECTORY() AS DIR_NAME
FROM RDB$DATABASE
Analysis is the transformation of known text into smaller, more precise units for easier retrieval.
The text goes through various operations to extract keywords, remove common words and punctuation marks, convert words to lower case, and so on.
The list of available analyzers can be obtained using the FTS$MANAGEMENT.FTS$ALL_ANALYZERS
procedure.
The following analyzers are available by default:
Let's consider the most commonly used analyzers.
The standard analyzer splits text into words, numbers, URLs and emails. It converts the text to lowercase, after which the stop-word filter for English is applied to the resulting terms.
StopAnalyzer splits text into non-letter characters. It converts the text to lowercase, after which the stop-word filter for English is applied to the resulting terms.
Unlike StandardAnalyzer, StopAnalyzer is not capable of recognizing URLs and e-mails.
SimpleAnalyzer separates text into non-letter characters. It converts text to lowercase. SimpleAnalyzer does not apply a stop-word filter, and it is not capable of recognizing URLs and e-mails.
WhitespaceAnalyzer - splits text by whitespace characters.
KeywordAnalyzer - Represents text as one single term. KeywordAnalyzer is useful for fields like id and zip codes.
There are also special analyzers for different languages, such as EnglishAnalyzer, FrenchAnalyzer, RussianAnalyzer and others.
Such analyzers split text into words, numbers, URLs and EMAILs. It converts the text to lower case, after which the stop word filter for the specified language is applied to the resulting terms.
After filtering, the stemming algorithm is applied. Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Stemming is necessary so that the search can be carried out not only by the word itself, but also by its forms.
Analyzers that use stemming algorithms from the "Snowball" project.
The IBSurgeon FTS UDR library allows you to create custom analyzers. Through the SQL language it is not possible to change the algorithms for splitting text into terms and stemming algorithms, however, you can specify a list of your own stop words.
You can create custom analyzer based on one of the built-in analyzers. In order to be able to create custom analyzer based on the base analyzer, the base analyzer must support the stop-word filter. A new analyzer is created with an empty list of stop words.
To create custom analyzer, call the FTS$MANAGEMENT.FTS$CREATE_ANALYZER
procedure. The first parameter specifies the name of the new analyzer,
the second - the name of the base analyzer, the third, optional parameter, you can specify the description of the analyzer.
After creating the analyzer, you can add the necessary stop words using the FTS$MANAGEMENT.FTS$ADD_STOP_WORD
procedure.
An example of creating your own analyzer:
execute procedure FTS$MANAGEMENT.FTS$CREATE_ANALYZER('FARMNAME_EN', 'ENGLISH');
commit;
execute procedure FTS$MANAGEMENT.FTS$ADD_STOP_WORD('FARMNAME_EN', 'farm');
execute procedure FTS$MANAGEMENT.FTS$ADD_STOP_WORD('FARMNAME_EN', 'owner');
commit;
Note:
If you add or remove stop words from the analyzer, on the basis of which there are already built indexes, then these indexes change their status to 'U' - Updated metadata, that is, they require rebuilding.
To create a full-text index, you need to perform three steps sequentially:
FTS$MANAGEMENT.FTS$CREATE_INDEX
;FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD
;FTS$MANAGEMENT.FTS$REBUILD_INDEX
.To create a full-text index for a table, call the procedure FTS$MANAGEMENT.FTS$CREATE_INDEX
.
The first parameter specifies the name of the full-text index, the second - the name of the indexed table. The remaining parameters are optional.
The third parameter specifies the name of the analyzer. The analyzer specifies for which language the indexed fields will be analyzed. If the parameter is omitted, the STANDARD analyzer (for English) will be used.
The fourth parameter specifies the name of the table field that will be returned as a search result.
This is usually a primary or unique key field. Setting a special pseudo field RDB$DB_KEY
is also supported.
The value of only one field of one of the types can be returned:
SMALLINT
, INTEGER
, BIGINT
- fields of these types are often used as artificial primary key based on generators (sequences);
CHAR(16) CHARACTER SET OCTETS
or BINARY(16)
- fields of these types are used as an artificial primary key based on GUID, that is, generated using GEN_UUID()
;
the RDB$DB_KEY
field of type CHAR(8) CHARACTER SET OCTETS
.
If this parameter is not set (NULL value), then an attempt will be made to find the field in the primary key for permanent tables and GTT.
This attempt will be successful if the key is not composite and the field on which it is built has one of the data types described above.
If the primary key does not exist, the pseudo field RDB$DB_KEY
will be used.
The fifth parameter can be set to describe the field.
For examples, a table with the following structure is used:
CREATE TABLE PRODUCTS (
PRODUCT_ID BIGINT GENERATED BY DEFAULT AS IDENTITY,
PRODUCT_UUID CHAR(16) CHARACTER SET OCTETS NOT NULL,
PRODUCT_NAME VARCHAR(200) NOT NULL,
UPC_EAN_CODE VARCHAR(150),
SELLING_PRICE VARCHAR(400),
MODEL_NUMBER VARCHAR(45),
ABOUT_PRODUCT BLOB SUB_TYPE TEXT,
PRODUCT_SPECIFICATION BLOB SUB_TYPE TEXT,
TECHNICAL_DETAILS BLOB SUB_TYPE TEXT,
SHIPPING_WEIGHT VARCHAR(15),
PRODUCT_DIMENSIONS VARCHAR(50),
VARIANTS BLOB SUB_TYPE TEXT,
PRODUCT_URL VARCHAR(255) NOT NULL,
IS_AMAZON_SELLER BOOLEAN,
CONSTRAINT PK_PRODUCT PRIMARY KEY (PRODUCT_ID),
CONSTRAINT UNQ_PRODUCT_UUID UNIQUE (PRODUCT_UUID)
);
The example below creates an index IDX_PRODUCT_NAME
for the PRODUCTS
table using the STANDARD
analyzer.
The PRODUCT_ID
field is returned. Its name was automatically extracted from the primary key of the PRODUCTS
table.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_NAME', 'PRODUCTS');
COMMIT;
The following example will create an index IDX_PRODUCT_NAME_EN
using the analyzer ENGLISH
.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_NAME_EN', 'PRODUCTS', 'ENGLISH');
COMMIT;
You can specify the name of the field that will be returned as a search result.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_ID_2_EN', 'PRODUCTS', 'ENGLISH', 'PRODUCT_ID');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_UUID_EN', 'PRODUCTS', 'ENGLISH', 'PRODUCT_UUID');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_DBKEY_EN', 'PRODUCTS', 'ENGLISH', 'RDB$DB_KEY');
COMMIT;
After creating the index, you need to add fields that will be searched using the procedure FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD
.
The first parameter specifies the index name, the second the name of the field to be added.
The third optional parameter can specify the significance multiplier for the field.
By default, the significance of all index fields is the same and equal to 1.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_NAME_EN', 'PRODUCT_NAME');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_DBKEY_EN', 'PRODUCT_NAME');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_UUID_EN', 'PRODUCT_NAME');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_ID_2_EN', 'PRODUCT_NAME');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_ID_2_EN', 'ABOUT_PRODUCT');
COMMIT;
In the indexes IDX_PRODUCT_NAME_EN
, IDX_PRODUCT_DBKEY_EN
and IDX_PRODUCT_UUID_EN
one field PRODUCT_NAME
is processed,
and in the index IDX_PRODUCT_ID_2_EN
two fields PRODUCT_NAME
and ABOUT_PRODUCT
are processed.
The following example shows the creation of an index with two fields PRODUCT_NAME
and ABOUT_PRODUCT
.
The significance of the PRODUCT_NAME
field is 4 times higher than the significance of the ABOUT_PRODUCT
field.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_ID_2X_EN', 'PRODUCTS', 'ENGLISH', 'PRODUCT_ID');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_ID_2X_EN', 'PRODUCT_NAME', 4);
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_ID_2X_EN', 'ABOUT_PRODUCT');
COMMIT;
To build the index, the procedure FTS$MANAGEMENT.FTS$REBUILD_INDEX
is used. The name of the full-text index must be specified as an input parameter.
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_NAME_EN');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_DBKEY_EN');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_UUID_EN');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_ID_2_EN');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_ID_2X_EN');
COMMIT;
At the stage of building for the index, a corresponding folder of the same name is created in the directory for full-text indexes. These folders contain the Lucene index files. This part of the process happens outside of transaction control, so ROLLBACK will not remove the index files.
In addition, in case of a successful build, the status of the index changes to 'C' (Complete). Status changes occur in the current transaction.
The description of the indexes is stored in the service table FTS$INDEXES
.
The FTS$INDEX_STATUS
field stores the index status. The index can have 4 statuses:
FTS$UPDATE_INDEXES
procedure.FTS$UPDATE_INDEXES
.
The index enters this state only after a complete build or rebuild.The FTS$SEARCH
procedure is used to search the full-text index.
The first parameter specifies the name of the index with which the search will be performed, and the second parameter specifies the search phrase. The third optional parameter sets a limit on the number of records returned, by default 1000. The fourth parameter allows you to enable the search results explanation mode, FALSE by default.
Search example:
SELECT
FTS$RELATION_NAME
, FTS$KEY_FIELD_NAME
, FTS$DB_KEY
, FTS$ID
, FTS$UUID
, FTS$SCORE
, FTS$EXPLANATION
FROM
FTS$SEARCH('IDX_PRODUCT_NAME_EN', 'Transformers Bumblebee')
Output parameters:
RDB$DB_KEY
;BIGINT
or INTEGER
;BINARY(16)
. This type is used to store the GUID;The query result will be available in one of the fields FTS$DB_KEY
, FTS$ID
, FTS$UUID
, depending on which resulting field was specified when creating the index.
To extract data from the target table, it is enough to simply make a join with it, the condition of which depends on how the index was created.
Here are examples of different join options:
SELECT
FTS.FTS$SCORE,
P.PRODUCT_ID,
P.PRODUCT_NAME
FROM
FTS$SEARCH('IDX_PRODUCT_NAME_EN', 'Transformers Bumblebee') FTS
JOIN PRODUCTS P ON P.PRODUCT_ID = FTS.FTS$ID;
SELECT
FTS.FTS$SCORE,
P.PRODUCT_UUID,
P.PRODUCT_NAME
FROM
FTS$SEARCH('IDX_PRODUCT_UUID_EN', 'Transformers Bumblebee') FTS
JOIN PRODUCTS P ON P.PRODUCT_UUID = FTS.FTS$UUID;
SELECT
FTS.FTS$SCORE,
P.RDB$DB_KEY,
P.PRODUCT_ID,
P.PRODUCT_NAME
FROM
FTS$SEARCH('IDX_PRODUCT_DBKEY_EN', 'Transformers Bumblebee') FTS
JOIN PRODUCTS P ON P.RDB$DB_KEY = FTS.FTS$DB_KEY;
To search for two fields at once, we use the index IDX_PRODUCT_ID_2_EN
, in which the fields PRODUCT_NAME
and ABOUT_PRODUCT
were specified during creation.
SELECT
FTS.FTS$SCORE,
P.PRODUCT_ID,
P.PRODUCT_NAME,
P.ABOUT_PRODUCT
FROM
FTS$SEARCH('IDX_PRODUCT_ID_2_EN', 'Transformers Bumblebee') FTS
JOIN PRODUCTS P ON P.PRODUCT_ID = FTS.FTS$ID;
To explain the search results, set the last parameter to TRUE.
SELECT
FTS.FTS$SCORE,
P.PRODUCT_ID,
P.PRODUCT_NAME,
P.ABOUT_PRODUCT,
FTS.FTS$EXPLANATION
FROM
FTS$SEARCH('IDX_PRODUCT_ID_2_EN', 'Transformers Bumblebee', 5, TRUE) FTS
JOIN PRODUCTS P ON P.PRODUCT_ID = FTS.FTS$ID;
The FTS$EXPLANATION
field will contain an explanation of the result.
4.12074 = (MATCH) sum of:
1.7817 = (MATCH) sum of:
1.16911 = (MATCH) weight(PRODUCT_NAME:transformers in 3329), product of:
0.455576 = queryWeight(PRODUCT_NAME:transformers), product of:
6.84324 = idf(docFreq=28, maxDocs=10002)
0.0665732 = queryNorm
2.56622 = (MATCH) fieldWeight(PRODUCT_NAME:transformers in 3329), product of:
1 = tf(termFreq(PRODUCT_NAME:transformers)=1)
6.84324 = idf(docFreq=28, maxDocs=10002)
0.375 = fieldNorm(field=PRODUCT_NAME, doc=3329)
0.612596 = (MATCH) weight(ABOUT_PRODUCT:transformers in 3329), product of:
0.480313 = queryWeight(ABOUT_PRODUCT:transformers), product of:
7.21481 = idf(docFreq=19, maxDocs=10002)
0.0665732 = queryNorm
1.27541 = (MATCH) fieldWeight(ABOUT_PRODUCT:transformers in 3329), product of:
1.41421 = tf(termFreq(ABOUT_PRODUCT:transformers)=2)
7.21481 = idf(docFreq=19, maxDocs=10002)
0.125 = fieldNorm(field=ABOUT_PRODUCT, doc=3329)
2.33904 = (MATCH) sum of:
1.60308 = (MATCH) weight(PRODUCT_NAME:bumblebee in 3329), product of:
0.533472 = queryWeight(PRODUCT_NAME:bumblebee), product of:
8.01332 = idf(docFreq=8, maxDocs=10002)
0.0665732 = queryNorm
3.00499 = (MATCH) fieldWeight(PRODUCT_NAME:bumblebee in 3329), product of:
1 = tf(termFreq(PRODUCT_NAME:bumblebee)=1)
8.01332 = idf(docFreq=8, maxDocs=10002)
0.375 = fieldNorm(field=PRODUCT_NAME, doc=3329)
0.735957 = (MATCH) weight(ABOUT_PRODUCT:bumblebee in 3329), product of:
0.526458 = queryWeight(ABOUT_PRODUCT:bumblebee), product of:
7.90796 = idf(docFreq=9, maxDocs=10002)
0.0665732 = queryNorm
1.39794 = (MATCH) fieldWeight(ABOUT_PRODUCT:bumblebee in 3329), product of:
1.41421 = tf(termFreq(ABOUT_PRODUCT:bumblebee)=2)
7.90796 = idf(docFreq=9, maxDocs=10002)
0.125 = fieldNorm(field=ABOUT_PRODUCT, doc=3329)
For comparison, an explanation of the index search results with fields that have a different significance coefficient is shown.
SELECT
FTS.FTS$SCORE,
P.PRODUCT_ID,
P.PRODUCT_NAME,
P.ABOUT_PRODUCT,
FTS.FTS$EXPLANATION
FROM
FTS$SEARCH('IDX_PRODUCT_ID_2X_EN', 'Transformers Bumblebee', 5, TRUE) FTS
JOIN PRODUCTS P ON P.PRODUCT_ID = FTS.FTS$ID;
13.7448 = (MATCH) sum of:
4.67643 = (MATCH) sum of:
4.67643 = (MATCH) weight(PRODUCT_NAME:transformers in 166), product of:
0.455576 = queryWeight(PRODUCT_NAME:transformers), product of:
6.84324 = idf(docFreq=28, maxDocs=10002)
0.0665732 = queryNorm
10.2649 = (MATCH) fieldWeight(PRODUCT_NAME:transformers in 166), product of:
1 = tf(termFreq(PRODUCT_NAME:transformers)=1)
6.84324 = idf(docFreq=28, maxDocs=10002)
1.5 = fieldNorm(field=PRODUCT_NAME, doc=166)
9.06839 = (MATCH) sum of:
9.06839 = (MATCH) weight(PRODUCT_NAME:bumblebee in 166), product of:
0.533472 = queryWeight(PRODUCT_NAME:bumblebee), product of:
8.01332 = idf(docFreq=8, maxDocs=10002)
0.0665732 = queryNorm
16.9988 = (MATCH) fieldWeight(PRODUCT_NAME:bumblebee in 166), product of:
1.41421 = tf(termFreq(PRODUCT_NAME:bumblebee)=2)
8.01332 = idf(docFreq=8, maxDocs=10002)
1.5 = fieldNorm(field=PRODUCT_NAME, doc=166)
Search queries (search phrases) consist of terms and operators. Lucene supports simple and complex terms. Simple terms consist of one word, complex terms consist of several. The first of them are ordinary words, for example, "Hello", "world". The second type of terms is a group of words, for example, "Hello world". Several terms can be linked together using logical operators.
Lucene supports multi-field search. By default, the search is performed in all fields of the full-text index,
the expression for each field is repeated and connected by the OR
operator. For example, if you have an index containing
the fields PRODUCT_NAME
and ABOUT_PRODUCT
, then the query
Transformers Bumblebee
will be equivalent to the query
(PRODUCT_NAME: Transformers Bumblebee) OR (ABOUT_PRODUCT: Transformers Bumblebee)
You can specify which field you want to search by, to do this, specify the field name, the colon symbol ":" in the request, and then the search phrase for this field.
Example of searching for the word "Polyester" in the ABOUT_PRODUCT
field and the words "Transformers Bumblebee" in the PRODUCT_NAME
field:
SELECT
FTS.FTS$SCORE,
P.PRODUCT_ID,
P.PRODUCT_NAME,
P.ABOUT_PRODUCT,
FTS.FTS$EXPLANATION
FROM
FTS$SEARCH('IDX_PRODUCT_ID_2_EN', '(PRODUCT_NAME: Transformers Bumblebee) AND (ABOUT_PRODUCT: Polyester)', 5, TRUE) FTS
JOIN PRODUCTS P ON P.PRODUCT_ID = FTS.FTS$ID;
Note: Lucene, like Firebird, supports delimited fields. It is strongly discouraged to use spaces and other special characters in field names, as this will make it much more difficult to write search queries. If your field contains a space or other special character, it must be escaped using the "\" character.
For example, if you have an index for two fields "Product Name" and "Product Specification" and you want to find the word "Weight" in the specification, then the query should look like this:
Product\ Specification: Weight
Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries).
To perform a single character wildcard search use the "?" symbol.
To perform a multiple character wildcard search use the "*" symbol.
The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:
te?t
Multiple character wildcard searches looks for 0 or more characters. For example, to search for "test", "tests" or "tester", you can use the search:
test*
You can also use the wildcard searches in the middle of a term.
te*t
Note: You cannot start a search query with the characters "?" or "*".
Lucene supports fuzzy searches based on the Levenshtein Distance, or Edit Distance algorithm. To do a fuzzy search use the tilde, "~", symbol at the end of a Single word Term. For example to search for a term similar in spelling to "roam" use the fuzzy search:
roam~
This search will find terms like "foam" and "roams".
An additional (optional) parameter can specify the required similarity. The value is between 0 and 1, with a value closer to 1 only terms with a higher similarity will be matched. For example:
roam~0.8
The default that is used if the parameter is not given is 0.5.
Lucene supports finding words are a within a specific distance away. To do a proximity search use the tilde, "~", symbol at the end of a Phrase. For example to search for "apache" and "jakarta" within 10 words of each other in a document use the search:
"jakarta apache"~10
Range Queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.
BYDATE:[20020101 TO 20030101]
This will find documents whose BYDATE fields have values between 20020101 and 20030101, inclusive. Note that Range Queries are not reserved for date fields.
You could also use range queries with non-date fields:
TITLE:{Aida TO Carmen}
This will find all documents whose titles are between "Aida" and "Carmen", but not including "Aida" and "Carmen".
Inclusive range queries are denoted by square brackets. Exclusive range queries are denoted by curly brackets.
Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, "^", symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be.
Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for
jakarta apache
and you want the term "jakarta" to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type:
jakarta^4 apache
This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example:
"jakarta apache"^4 "Apache Lucene"
By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2)
Boolean operators allow you to use logical constructions when setting
search conditions, and allow you to combine several terms.
Lucene supports the following logical operators: AND
, +
, OR
, NOT
, -
.
Boolean operators must be specified in capital letters.
OR
is the default logical operator, which means that if
no other logical operator is specified between the two terms of the search phrase, then the OR
operator is substituted. In this case, the search system finds
the document if one of the terms specified in the search phrase is present in it.
An alternative notation for the OR
operator is ||
.
"Hello world" "world"
Equivalent to:
"Hello world" OR "world"
The AND
operator indicates that all search terms combined by the operator must be present in the text.
An alternative notation of the operator is &&
.
"Hello" AND "world"
The +
operator indicates that the word following it must necessarily be present in the text.
For example, to search for records that must contain the word "hello" and may
contain the word "world", the search phrase may look like:
+Hello world
The NOT
operator allows you to exclude from the search results those in which the term following the operator occurs. Instead of the word NOT
,
the symbol "!" can be used. For example, to search for records that should contain the word "hello" and should not contain the word "world",
the search phrase may look like:
"Hello" NOT "world"
Note: The NOT
operator cannot be used with only one term. For example, a search with this condition will not return results:
NOT "world"
This operator is analogous to the NOT
operator. Usage example:
"Hello" -"world"
Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.
To search for either "jakarta" or "apache" and "website" use the query:
(jakarta OR apache) AND website
This eliminates any confusion and makes sure you that "website" must exist and either term "jakarta" or "apache" may exist.
Lucene supports using parentheses to group multiple clauses to a single field.
To search for a title that contains both the word "return" and the phrase "pink panther" use the query:
TITLE:(+return +"pink panther")
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
To escape these character use the "\" before the character. For example to search for "(1 + 1) : 2" use the query:
\( 1 \+ 1 \) \: 2
To escape special characters, you can use the FTS$ESCAPE_QUERY
function.
FTS$ESCAPE_QUERY('(1 + 1) : 2')
A more detailed English-language description of the syntax is available on the official website Lucene: https://lucene.apache.org/core/3_0_3/queryparsersyntax.html.
You can index not only permanent tables, but also complex views.
In order to index a view, one requirement must be met: there must be a field in the view by which you can uniquely identify the record.
Let's say you have a view V_PRODUCT_CATEGORIES
, where PRODUCT_UUID
is the unique identifier of the PRODUCTS
table:
CREATE TABLE CATEGORIES (
ID BIGINT GENERATED BY DEFAULT AS IDENTITY,
CATEGORY_NAME VARCHAR(80) NOT NULL,
CONSTRAINT PK_CATEGORY PRIMARY KEY (ID),
CONSTRAINT UNQ_CATEGORY_NAME UNIQUE (CATEGORY_NAME)
);
CREATE TABLE PRODUCT_CATEGORIES (
ID BIGINT GENERATED BY DEFAULT AS IDENTITY,
PRODUCT_UUID CHAR(16) CHARACTER SET OCTETS NOT NULL,
CATEGORY_ID BIGINT NOT NULL,
CONSTRAINT PK_PRODUCT_CATEGORIES PRIMARY KEY (ID),
CONSTRAINT UNQ_PRODUCT_CATEGORIES UNIQUE (PRODUCT_UUID, CATEGORY_ID),
CONSTRAINT FK_PRODUCT_CAT_REF_CATEGORY FOREIGN KEY (CATEGORY_ID) REFERENCES CATEGORIES (ID),
CONSTRAINT FK_PRODUCT_CAT_REF_PRODUCT FOREIGN KEY (PRODUCT_UUID) REFERENCES PRODUCTS (PRODUCT_UUID)
);
CREATE OR ALTER VIEW V_PRODUCT_CATEGORIES(
PRODUCT_UUID,
CATEGORIES)
AS
SELECT
PC.PRODUCT_UUID
, LIST(C.CATEGORY_NAME, ' | ') AS CATEGORIES
FROM
PRODUCT_CATEGORIES PC
JOIN CATEGORIES C ON C.ID = PC.CATEGORY_ID
GROUP BY 1
;
You want to search for products of a category, but the name of the category is in the reference table and one product can have several categories. In this case, you can create the following full-text index:
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$CREATE_INDEX('IDX_PRODUCT_CATEGORIES', 'V_PRODUCT_CATEGORIES', 'ENGLISH', 'PRODUCT_UUID');
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD('IDX_PRODUCT_CATEGORIES', 'CATEGORIES');
COMMIT;
EXECUTE PROCEDURE FTS$MANAGEMENT.FTS$REBUILD_INDEX('IDX_PRODUCT_CATEGORIES');
COMMIT;
The search for a product by its category looks like this:
SELECT
FTS.FTS$SCORE,
P.PRODUCT_UUID,
P.PRODUCT_NAME,
PC.CATEGORIES,
FTS.FTS$EXPLANATION
FROM
FTS$SEARCH('IDX_PRODUCT_CATEGORIES', '"Toys & Games"') FTS
JOIN V_PRODUCT_CATEGORIES PC ON PC.PRODUCT_UUID = FTS.FTS$UUID
JOIN PRODUCTS P ON P.PRODUCT_UUID = PC.PRODUCT_UUID;
The result of the search depends on which analyzer was used to create the index. The analyzer performs the following functions: splits the text into separate words, converts words to lowercase, removal of stop words, other filters, stemming. As a result of text analysis for indexing, terms will be selected from the text, which fall into the index.
In order to find out which terms are included in the index, you can use the stored procedure FTS$ANALYZE
.
PROCEDURE FTS$ANALYZE (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD'
)
RETURNS (
FTS$TERM VARCHAR(8191) CHARACTER SET UTF8
)
The first parameter specifies the text to be analyzed, and the second parameter is the name of the analyzer.
Usage example:
SELECT FTS$TERM
FROM FTS$ANALYZE('IBSurgeon FTS UDR library implements full text search procedures and functions for Firebird SQL', 'STANDARD')
It is often necessary not only to find documents on request, but also to highlight what was found.
To highlight the found terms in a text fragment, the package FTS$HIGHLIGHTER
is used. The package contains:
FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT
to highlight the found terms in a text fragment;FTS$HIGHLIGHTER.FTS$BEST_FRAGMENTS
returns several fragments of text with the highlight of terms in the fragment.The function FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT
returns the best text fragment in which the found terms are highlighted with tags.
The function is described as
FUNCTION FTS$BEST_FRAGMENT (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD',
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL,
FTS$FRAGMENT_SIZE SMALLINT NOT NULL DEFAULT 512,
FTS$LEFT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '<b>',
FTS$RIGHT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '</b>')
RETURNS VARCHAR(8191) CHARACTER SET UTF8;
The FTS$TEXT
parameter specifies the text in which fragments are searched and selected.
The FTS$QUERY
parameter specifies the search phrase.
The third optional parameter FTS$ANALYZER
specifies the name of the analyzer with which the terms are allocated.
The FTS$FIELD_NAME
parameter specifies the name of the field being searched for. It must be specified if the search query explicitly contains several fields,
otherwise the parameter can be omitted or set as NULL.
The FTS$FRAGMENT_SIZE
parameter specifies a limit on the length of the returned fragment.
Please note that the actual length of the returned text may be longer. The returned fragment usually does not break the words,
in addition, it does not take into account the length of the tags themselves for selection.
The FTS$LEFT_TAG
parameter specifies the tag that is added to the found term on the left.
The FTS$RIGHT_TAG
parameter specifies the tag that is added to the found fragment on the right.
The simplest example of use:
SELECT
FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT(
q'!!Go to your orders and start the return Select the ship method Ship it! |
Go to your orders and start the return Select the ship method Ship it! |
show up to 2 reviews by default A shiny Pewter key ring with a 3D element
of a rotating golf ball made of a PVC material. This makes a great accessory
for your sports bag. | 1.12 ounces (View shipping rates and policies)!!',
'A shiny Pewter',
'English',
NULL
) AS TEXT_FRAGMENT
FROM RDB$DATABASE
Now let's combine the search itself and the selection of the found terms:
EXECUTE BLOCK (
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8 = :FTS_QUERY
)
RETURNS (
FTS$SCORE DOUBLE PRECISION,
PRODUCT_ID TYPE OF COLUMN PRODUCTS.PRODUCT_ID,
PRODUCT_NAME TYPE OF COLUMN PRODUCTS.PRODUCT_NAME,
ABOUT_PRODUCT TYPE OF COLUMN PRODUCTS.ABOUT_PRODUCT,
HIGHTLIGHT_PRODUCT_NAME VARCHAR(8191) CHARACTER SET UTF8,
HIGHTLIGHT_ABOUT_PRODUCT VARCHAR(8191) CHARACTER SET UTF8
)
AS
BEGIN
FOR
SELECT
FTS.FTS$SCORE,
PRODUCTS.PRODUCT_ID,
PRODUCTS.PRODUCT_NAME,
PRODUCTS.ABOUT_PRODUCT,
FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT(PRODUCTS.PRODUCT_NAME, :FTS$QUERY, 'ENGLISH', 'PRODUCT_NAME') AS HIGHTLIGHT_PRODUCT_NAME,
FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT(PRODUCTS.ABOUT_PRODUCT, :FTS$QUERY, 'ENGLISH', 'ABOUT_PRODUCT') AS HIGHTLIGHT_ABOUT_PRODUCT
FROM FTS$SEARCH('IDX_PRODUCT_ID_2_EN', :FTS$QUERY, 25) FTS
JOIN PRODUCTS ON PRODUCTS.PRODUCT_ID = FTS.FTS$ID
INTO
FTS$SCORE,
PRODUCT_ID,
PRODUCT_NAME,
ABOUT_PRODUCT,
HIGHTLIGHT_PRODUCT_NAME,
HIGHTLIGHT_ABOUT_PRODUCT
DO
SUSPEND;
END
The procedure FTS$HIGHLIGHTER.FTS$BEST_FRAGMENTS
returns several fragments of text in which the found terms are marked with tags.
The procedure is described as
PROCEDURE FTS$BEST_FRAGMENTS (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD',
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL,
FTS$FRAGMENT_SIZE SMALLINT NOT NULL DEFAULT 512,
FTS$MAX_NUM_FRAGMENTS INTEGER NOT NULL DEFAULT 10,
FTS$LEFT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '<b>',
FTS$RIGHT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '</b>'
)
RETURNS (
FTS$FRAGMENT VARCHAR(8191) CHARACTER SET UTF8
);
The input parameters of the procedure FTS$HIGHLIGHTER.FTS$BEST_FRAGMENTS
are identical to the parameters of
the function FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT
, but there is one additional parameter FTS$MAX_NUM_FRAGMENTS
,
which limits the number of fragments returned.
The text of the found fragments with selected occurrences of terms is returned to the output parameter FTS$FRAGMENT
.
This procedure should be applied in one document already found.
Usage example:
SELECT
BOOKS.TITLE
, BOOKS.CONTENT
, F.FTS$FRAGMENT
FROM
BOOKS
LEFT JOIN FTS$HIGHLIGHTER.FTS$BEST_FRAGMENTS(
BOOKS.CONTENT,
'friendly',
'ENGLISH'
) F ON TRUE
WHERE BOOKS.ID = 8
There are several ways to keep full-text indexes up-to- date:
Periodically call the procedure FTS$MANAGEMENT.FTS$REBUILD_INDEX
for the specified index.
This method completely rebuilds the full-text index. In this case, all records of the table or view are read
for which the index was created.
You can maintain full-text indexes using triggers and calling one of the FTS$LOG_BY_ID
procedures inside them,
FTS$LOG_BY_UUID
or FTS$LOG_BY_DBKEY
. Which of the procedures to call
depends on which type of field is selected as the key (integer, UUID (GIUD) or RDB$DB_KEY
).
When calling these procedures, the change record is added to a special table FTS$LOG
(change log).
Changes from the log are transferred to full-text indexes by calling the procedure FTS$UPDATE_INDEXES
.
The call to this procedure must be done in a separate script, which can be placed in the task scheduler (Windows)
or cron (Linux) with some frequency, for example 5 minutes.
Delayed updating of full-text indexes, using FirebirdStreaming technology. In this case, a special service reads the replication logs and extracts from them the information necessary to update the full-text indexes. (under development).
To maintain the relevance of full-text indexes, it is necessary to create triggers that, when changing
any of the fields included in the full-text index, writes information about the record change to a special table
FTS$LOG
(log).
Rules for writing triggers to support full-text indexes:
In the trigger, it is necessary to check all fields that participate in the full-text index.
The field validation conditions must be combined via OR
.
For the INSERT
operation, it is necessary to check all fields included in full-text indexes whose value is different
from NULL
. If this condition is met, then one of the procedures must be performed
FTS$LOG_BY_DBKEY('<table name>', NEW.RDB$DB_KEY, 'I');
or FTS$LOG_BY_ID('<table name>', NEW.<key field>, 'I')
or FTS$LOG_BY_UUID('<table name>', NEW.<key field>, 'I')
.
For the UPDATE
operation, it is necessary to check all fields included in full-text indexes whose value has changed.
If this condition is met, then the procedure FTS$LOG_BY_DBKEY('<table name>', OLD.RDB$DB_KEY, 'U');
or FTS$LOG_BY_ID('<table name>', OLD.<key field>, 'U')
or FTS$LOG_BY_UUID('<table name>', OLD.<key field>, 'U')
.
For the DELETE
operation, it is necessary to check all fields included in full-text indexes whose value is different
from NULL
. If this condition is met, then it is necessary to perform the procedure
FTS$LOG_CHANGE('<table name>', OLD.RDB$DB_KEY, 'D');
.
To facilitate the task of writing such triggers, there is a special package FTS$TRIGGER_HELPER
, which
contains procedures for generating trigger source texts. So for example, in order to generate triggers
to support full-text indexes created for the PRODUCTS
table, you need to run the following query:
SELECT
FTS$TRIGGER_SCRIPT
FROM
FTS$TRIGGER_HELPER.FTS$MAKE_TRIGGERS('PRODUCTS', TRUE)
This query will return the following trigger text for all created FTS indexes on the PRODUCTS
table:
CREATE OR ALTER TRIGGER "FTS$PRODUCTS_AIUD" FOR "PRODUCTS"
ACTIVE AFTER INSERT OR UPDATE OR DELETE
POSITION 100
AS
BEGIN
/* Block for key PRODUCT_ID */
IF (INSERTING AND (NEW."ABOUT_PRODUCT" IS NOT NULL
OR NEW."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_ID('PRODUCTS', NEW."PRODUCT_ID", 'I');
IF (UPDATING AND (NEW."ABOUT_PRODUCT" IS DISTINCT FROM OLD."ABOUT_PRODUCT"
OR NEW."PRODUCT_NAME" IS DISTINCT FROM OLD."PRODUCT_NAME")) THEN
EXECUTE PROCEDURE FTS$LOG_BY_ID('PRODUCTS', OLD."PRODUCT_ID", 'U');
IF (DELETING AND (OLD."ABOUT_PRODUCT" IS NOT NULL
OR OLD."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_ID('PRODUCTS', OLD."PRODUCT_ID", 'D');
/* Block for key PRODUCT_UUID */
IF (INSERTING AND (NEW."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('PRODUCTS', NEW."PRODUCT_UUID", 'I');
IF (UPDATING AND (NEW."PRODUCT_NAME" IS DISTINCT FROM OLD."PRODUCT_NAME")) THEN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('PRODUCTS', OLD."PRODUCT_UUID", 'U');
IF (DELETING AND (OLD."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('PRODUCTS', OLD."PRODUCT_UUID", 'D');
/* Block for key RDB$DB_KEY */
IF (INSERTING AND (NEW."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_DBKEY('PRODUCTS', NEW.RDB$DB_KEY, 'I');
IF (UPDATING AND (NEW."PRODUCT_NAME" IS DISTINCT FROM OLD."PRODUCT_NAME")) THEN
EXECUTE PROCEDURE FTS$LOG_BY_DBKEY('PRODUCTS', OLD.RDB$DB_KEY, 'U');
IF (DELETING AND (OLD."PRODUCT_NAME" IS NOT NULL)) THEN
EXECUTE PROCEDURE FTS$LOG_BY_DBKEY('PRODUCTS', OLD.RDB$DB_KEY, 'D');
END
Updating all full-text indexes, you need to create an SQL script fts$update.sql
EXECUTE PROCEDURE FTS$UPDATE_INDEXES;
Then a script to call the SQL script via ISQL, something like the following
isql -user SYSDBA -pas masterkey -i fts$update.sql inet://localhost/mydatabase
Pay attention! The package FTS$TRIGGER_HELPER
helps to generate triggers to support full-text indexes
only for regular tables. If you want to maintain a full-text index on the view, then you need
to develop such triggers for the base tables of the view yourself.
Below is an example that supports a full-text index of triggers for a view V_PRODUCT_CATEGORIES
.
SET TERM ^;
-- Field PRODUCT_UUID and CATEGORY_ID from table PRODUCT_CATEGORIES
CREATE OR ALTER TRIGGER FTS$PRODUCT_CATEGORIES_AIUD FOR PRODUCT_CATEGORIES
ACTIVE AFTER INSERT OR UPDATE OR DELETE
POSITION 100
AS
BEGIN
IF (INSERTING) THEN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('V_PRODUCT_CATEGORIES', NEW.PRODUCT_UUID, 'I');
IF (UPDATING AND (NEW.PRODUCT_UUID <> OLD.PRODUCT_UUID
OR NEW.CATEGORY_ID <> OLD.CATEGORY_ID)) THEN
BEGIN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('V_PRODUCT_CATEGORIES', OLD.PRODUCT_UUID, 'D');
EXECUTE PROCEDURE FTS$LOG_BY_UUID('V_PRODUCT_CATEGORIES', NEW.PRODUCT_UUID, 'I');
END
IF (DELETING) THEN
EXECUTE PROCEDURE FTS$LOG_BY_UUID('V_PRODUCT_CATEGORIES', OLD.PRODUCT_UUID, 'D');
END
^
-- Change from table CATEGORIES
CREATE OR ALTER TRIGGER FTS$CATEGORIES_AU FOR CATEGORIES
ACTIVE AFTER UPDATE
POSITION 100
AS
DECLARE PRODUCT_UUID TYPE OF COLUMN PRODUCT_CATEGORIES.PRODUCT_UUID;
BEGIN
IF (NEW.CATEGORY_NAME <> OLD.CATEGORY_NAME) THEN
BEGIN
SELECT MAX(PRODUCT_CATEGORIES.PRODUCT_UUID)
FROM PRODUCT_CATEGORIES
JOIN CATEGORIES ON CATEGORIES.ID = PRODUCT_CATEGORIES.CATEGORY_ID
WHERE CATEGORIES.CATEGORY_NAME = OLD.CATEGORY_NAME
INTO PRODUCT_UUID;
EXECUTE PROCEDURE FTS$LOG_BY_UUID('V_PRODUCT_CATEGORIES', :PRODUCT_UUID, 'U');
END
END
END
^
SET TERM ;^
The FTS$MANAGEMENT
package contains procedures and functions for managing full-text indexes. This package is intended
for database administrators.
The function FTS$MANAGEMENT.FTS$GET_DIRECTORY
returns the directory where the files and folders of full-text indexes for
the current database are located.
FUNCTION FTS$GET_DIRECTORY ()
RETURNS VARCHAR(255) CHARACTER SET UTF8
DETERMINISTIC;
The procedure FTS$MANAGEMENT.FTS$ALL_ANALYZERS
returns a list of available analyzers.
PROCEDURE FTS$ALL_ANALYZERS
RETURNS (
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8,
FTS$BASE_ANALYZER VARCHAR(63) CHARACTER SET UTF8,
FTS$STOP_WORDS_SUPPORTED BOOLEAN,
FTS$SYSTEM_FLAG BOOLEAN
);
Output parameters:
The FTS$MANAGEMENT.FTS$CREATE_ANALYZER
procedure creates a new analyzer based on the base one. One of the built-in analyzers can be used as a base one.
PROCEDURE FTS$CREATE_ANALYZER (
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$BASE_ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$DESCRIPTION BLOB SUB_TYPE TEXT CHARACTER SET UTF8 DEFAULT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$DROP_ANALYZER
drops a custom analyzer.
PROCEDURE FTS$DROP_ANALYZER (
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$ANALYZER_STOP_WORDS
returns a list of stop words for the given analyzer.
PROCEDURE FTS$ANALYZER_STOP_WORDS (
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL)
RETURNS (
FTS$WORD VARCHAR(63) CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The procedure FTS$MANAGEMENT.FTS$ADD_STOP_WORD
adds a stop word to the custom analyzer.
PROCEDURE FTS$ADD_STOP_WORD (
FTS$ANALYZER_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$WORD VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$DROP_STOP_WORD
removes a stop word from the custom analyzer.
PROCEDURE FTS$DROP_STOP_WORD (
FTS$ANALYZER_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$WORD VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Входные параметры:
The procedure FTS$MANAGEMENT.FTS$CREATE_INDEX
creates a new full-text index.
PROCEDURE FTS$CREATE_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 DEFAULT 'STANDARD',
FTS$KEY_FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL,
FTS$DESCRIPTION BLOB SUB_TYPE TEXT CHARACTER SET UTF8 DEFAULT NULL
);
Input parameters:
FTS$SEARCH
, usually this is the key field of the table;The procedure FTS$MANAGEMENT.FTS$DROP_INDEX
deletes the full-text index.
PROCEDURE FTS$DROP_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.SET_INDEX_ACTIVE
allows you to make the index active or inactive.
PROCEDURE FTS$SET_INDEX_ACTIVE (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$INDEX_ACTIVE BOOLEAN NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$COMMENT_ON_INDEX
adds or deletes a user comment to the index.
PROCEDURE FTS$COMMENT_ON_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$DESCRIPTION BLOB SUB_TYPE TEXT CHARACTER SET UTF8
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$ADD_INDEX_FIELD
adds a new field to the full-text index.
PROCEDURE FTS$ADD_INDEX_FIELD (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$BOOST DOUBLE PRECISION DEFAULT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$DROP_INDEX_FIELD
removes the field from the full-text index.
PROCEDURE FTS$DROP_INDEX_FIELD (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$SET_INDEX_FIELD_BOOST
sets the significance coefficient for the index field.
PROCEDURE FTS$SET_INDEX_FIELD_BOOST (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$BOOST DOUBLE PRECISION
);
Input parameters:
If you do not specify a significance factor when adding a field to the index, then by default it is 1.0.
Using the procedure FTS$MANAGEMENT.FTS$SET_INDEX_FIELD_BOOST
it can be changed.
Note that after running this procedure, the index needs to be rebuilt.
The procedure FTS$MANAGEMENT.FTS$REBUILD_INDEX
rebuilds the full-text index.
PROCEDURE FTS$REBUILD_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$REINDEX_TABLE
rebuilds all full-text indexes for the specified table.
PROCEDURE FTS$REINDEX_TABLE (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$FULL_REINDEX
rebuilds all full-text indexes in the database.
The procedure FTS$MANAGEMENT.FTS$OPTIMIZE_INDEX
optimizes the specified index.
PROCEDURE FTS$OPTIMIZE_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
);
Input parameters:
The procedure FTS$MANAGEMENT.FTS$OPTIMIZE_INDEXES
optimizes all full-text indexes in the database.
The FTS$SEARCH
procedure performs a full-text search by the specified index.
PROCEDURE FTS$SEARCH (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8,
FTS$LIMIT INT NOT NULL DEFAULT 1000,
FTS$EXPLAIN BOOLEAN DEFAULT FALSE
)
RETURNS (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$KEY_FIELD_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$DB_KEY CHAR(8) CHARACTER SET OCTETS,
FTS$ID BIGINT,
FTS$UUID CHAR(16) CHARACTER SET OCTETS,
FTS$SCORE DOUBLE PRECISION,
FTS$EXPLANATION BLOB SUB_TYPE TEXT CHARACTER SET UTF8
)
Input parameters:
Output parameters:
RDB$DB_KEY
;BIGINT
or INTEGER
;BINARY(16)
. This type is used to store the GUID;The 'FTS$ESCAPE_QUERY` function escapes special characters in the search query.
FUNCTION FTS$ESCAPE_QUERY (
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8
)
RETURNS VARCHAR(8191) CHARACTER SET UTF8;
Input parameters:
The FTS$ANALYZE
procedure analyzes the text according to the given analyzer and returns a list of terms.
PROCEDURE FTS$ANALYZE (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD'
)
RETURNS (
FTS$TERM VARCHAR(8191) CHARACTER SET UTF8
)
Input parameters:
Output parameters:
The procedure FTS$LOG_BY_ID
adds a record of a change in one of the fields included in the full-text indexes
built on the table to the change log FTS$LOG
, on the basis of which the full-text indexes will be updated.
This procedure should be used if an integer field is used as the primary key. Such keys
are often generated using generators/sequences.
PROCEDURE FTS$LOG_BY_ID (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$ID BIGINT NOT NULL,
FTS$CHANGE_TYPE FTS$D_CHANGE_TYPE NOT NULL
)
Input parameters:
The procedure FTS$LOG_BY_UUID
adds a record of a change in one of the fields included in the full-text indexes
built on the table to the change log FTS$LOG
, on the basis of which the full-text indexes will be updated.
This procedure should be used if a UUID (GUID) is used as the primary key. Such keys
are often generated using the GEN_UUID
function.
PROCEDURE FTS$LOG_BY_UUID (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$UUID CHAR(16) CHARACTER SET OCTETS NOT NULL,
FTS$CHANGE_TYPE FTS$D_CHANGE_TYPE NOT NULL
)
Input parameters:
The procedure FTS$LOG_BY_DBKEY
adds a record of a change in one of the fields included in the full-text indexes
built on the table to the change log FTS$LOG
, on the basis of which the full-text indexes will be updated.
This procedure should be used if the pseudo field RDB$DB_KEY
is used as the primary key.
PROCEDURE FTS$LOG_BY_DBKEY (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$DBKEY CHAR(8) CHARACTER SET OCTETS NOT NULL,
FTS$CHANGE_TYPE FTS$D_CHANGE_TYPE NOT NULL
)
Input parameters:
RDB$DB_KEY
;The procedure FTS$CLEAR_LOG
clears the change log FTS$LOG
, based on which the full-text indexes are updated.
The procedure FTS$UPDATE_INDEXES
updates full-text indexes on entries in the change log FTS$LOG
.
This procedure is usually run on a schedule (cron) in a separate session with some interval, for example 5 seconds.
The FTS$HIGHLIGHTER
package contains procedures and functions that return fragments of the text in which the original phrase was found,
and highlights the terms found.
The FTS$HIGHLIGHTER.FTS$BEST_FRAGMENT
function returns the best text fragment that matches the full-text search expression
and highlights the terms found in it.
FUNCTION FTS$BEST_FRAGMENT (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD',
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL,
FTS$FRAGMENT_SIZE SMALLINT NOT NULL DEFAULT 512,
FTS$LEFT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '<b>',
FTS$RIGHT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '</b>'
)
RETURNS VARCHAR(8191) CHARACTER SET UTF8;
Input parameters:
The procedure FTS$HIGHLIGHTER.FTS$BEST_FRAGMENTS
returns the best text fragments that match the full-text search expression
and highlights the terms found in them.
PROCEDURE FTS$BEST_FRAGMENTS (
FTS$TEXT BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$QUERY VARCHAR(8191) CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8 NOT NULL DEFAULT 'STANDARD',
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL,
FTS$FRAGMENT_SIZE SMALLINT NOT NULL DEFAULT 512,
FTS$MAX_NUM_FRAGMENTS INTEGER NOT NULL DEFAULT 10,
FTS$LEFT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '<b>',
FTS$RIGHT_TAG VARCHAR(50) CHARACTER SET UTF8 NOT NULL DEFAULT '</b>'
)
RETURNS (
FTS$FRAGMENT VARCHAR(8191) CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The package FTS$TRIGGER_HELPER
contains procedures and functions that help to create triggers to maintain the relevance
of full-text indexes.
The procedure FTS$TRIGGER_HELPER.FTS$MAKE_TRIGGERS
generates trigger source codes for a given table
to keep full-text indexes up to date.
PROCEDURE FTS$MAKE_TRIGGERS (
FTS$RELATION_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$MULTI_ACTION BOOLEAN NOT NULL DEFAULT TRUE,
FTS$POSITION SMALLINT NOT NULL DEFAULT 100
)
RETURNS (
FTS$TRIGGER_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_RELATION VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_EVENTS VARCHAR(26) CHARACTER SET UTF8,
FTS$TRIGGER_POSITION SMALLINT,
FTS$TRIGGER_SOURCE BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$TRIGGER_SCRIPT BLOB SUB_TYPE TEXT CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The procedure FTS$TRIGGER_HELPER.FTS$MAKE_TRIGGERS_BY_INDEX
generates trigger source codes for a given index
to keep the full-text index up to date.
PROCEDURE FTS$MAKE_TRIGGERS_BY_INDEX (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$MULTI_ACTION BOOLEAN NOT NULL DEFAULT TRUE,
FTS$POSITION SMALLINT NOT NULL DEFAULT 100
)
RETURNS (
FTS$TRIGGER_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_RELATION VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_EVENTS VARCHAR(26) CHARACTER SET UTF8,
FTS$TRIGGER_POSITION SMALLINT,
FTS$TRIGGER_SOURCE BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$TRIGGER_SCRIPT BLOB SUB_TYPE TEXT CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The procedure FTS$TRIGGER_HELPER.FTS$MAKE_ALL_TRIGGERS
generates trigger source codes to keep all full-text indexes up to date.
PROCEDURE FTS$MAKE_ALL_TRIGGERS (
FTS$MULTI_ACTION BOOLEAN NOT NULL DEFAULT TRUE,
FTS$POSITION SMALLINT NOT NULL DEFAULT 100
)
RETURNS (
FTS$TRIGGER_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_RELATION VARCHAR(63) CHARACTER SET UTF8,
FTS$TRIGGER_EVENTS VARCHAR(26) CHARACTER SET UTF8,
FTS$TRIGGER_POSITION SMALLINT,
FTS$TRIGGER_SOURCE BLOB SUB_TYPE TEXT CHARACTER SET UTF8,
FTS$TRIGGER_SCRIPT BLOB SUB_TYPE TEXT CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The FTS$STATISTICS
package contains procedures and functions for obtaining information about full-text indexes and their statistics.
This package is intended primarily for database administrators.
The function FTS$STATISTICS.FTS$LUCENE_VERSION
returns the version of the lucene++ library based on which the full-text search is built.
FUNCTION FTS$LUCENE_VERSION ()
RETURNS VARCHAR(20) CHARACTER SET UTF8
DETERMINISTIC;
The function FTS$STATISTICS.FTS$GET_DIRECTORY
returns the directory where the files and folders of full-text indexes for
the current database are located.
FUNCTION FTS$GET_DIRECTORY ()
RETURNS VARCHAR(255) CHARACTER SET UTF8
DETERMINISTIC;
The procedure FTS$STATISTICS.FTS$INDEX_STATISTICS
returns low-level information and statistics for the specified index.
PROCEDURE FTS$INDEX_STATISTICS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
)
RETURNS (
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8,
FTS$INDEX_STATUS TYPE OF FTS$D_INDEX_STATUS,
FTS$INDEX_DIRECTORY VARCHAR(255) CHARACTER SET UTF8,
FTS$INDEX_EXISTS BOOLEAN,
FTS$INDEX_OPTIMIZED BOOLEAN,
FTS$HAS_DELETIONS BOOLEAN,
FTS$NUM_DOCS INTEGER,
FTS$NUM_DELETED_DOCS INTEGER,
FTS$NUM_FIELDS SMALLINT,
FTS$INDEX_SIZE INTEGER
);
Input parameters:
Output parameters:
The procedure FTS$STATISTICS.FTS$INDICES_STATISTICS
returns low-level information and statistics for all full-text indexes.
PROCEDURE FTS$INDICES_STATISTICS
RETURNS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$ANALYZER VARCHAR(63) CHARACTER SET UTF8,
FTS$INDEX_STATUS TYPE OF FTS$D_INDEX_STATUS,
FTS$INDEX_DIRECTORY VARCHAR(255) CHARACTER SET UTF8,
FTS$INDEX_EXISTS BOOLEAN,
FTS$INDEX_OPTIMIZED BOOLEAN,
FTS$HAS_DELETIONS BOOLEAN,
FTS$NUM_DOCS INTEGER,
FTS$NUM_DELETED_DOCS INTEGER,
FTS$NUM_FIELDS SMALLINT,
FTS$INDEX_SIZE INTEGER
);
Output parameters:
The procedure FTS$STATISTICS.FTS$INDEX_SEGMENT_INFOS
returns information about index segments.
Here the segment is defined from the Lucene perspective.
PROCEDURE FTS$INDEX_SEGMENT_INFOS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
)
RETURNS (
FTS$SEGMENT_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$DOC_COUNT INTEGER,
FTS$SEGMENT_SIZE INTEGER,
FTS$USE_COMPOUND_FILE BOOLEAN,
FTS$HAS_DELETIONS BOOLEAN,
FTS$DEL_COUNT INTEGER,
FTS$DEL_FILENAME VARCHAR(255) CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The procedure FTS$STATISTICS.FTS$INDEX_FIELDS
returns the names of the internal fields of the index.
PROCEDURE FTS$INDEX_FIELDS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
)
RETURNS (
FTS$FIELD_NAME VARCHAR(127) CHARACTER SET UTF8
);
Input parameters:
Output parameters:
The procedure FTS$STATISTICS.FTS$INDEX_FILES
returns information about index files.
PROCEDURE FTS$INDEX_FILES (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
)
RETURNS (
FTS$FILE_NAME VARCHAR(127) CHARACTER SET UTF8,
FTS$FILE_TYPE VARCHAR(63) CHARACTER SET UTF8,
FTS$FILE_SIZE INTEGER
);
Input parameters:
Output parameters:
The procedure FTS$STATISTICS.FTS$INDEX_FIELD_INFOS
returns information about the index fields.
PROCEDURE FTS$INDEX_FIELD_INFOS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL,
FTS$SEGMENT_NAME VARCHAR(63) CHARACTER SET UTF8 DEFAULT NULL
)
RETURNS (
FTS$FIELD_NAME VARCHAR(127) CHARACTER SET UTF8,
FTS$FIELD_NUMBER SMALLINT,
FTS$IS_INDEXED BOOLEAN,
FTS$STORE_TERM_VECTOR BOOLEAN,
FTS$STORE_OFFSET_TERM_VECTOR BOOLEAN,
FTS$STORE_POSITION_TERM_VECTOR BOOLEAN,
FTS$OMIT_NORMS BOOLEAN,
FTS$OMIT_TERM_FREQ_AND_POS BOOLEAN,
FTS$STORE_PAYLOADS BOOLEAN
);
Input parameters:
Output parameters:
The FTS$STATISTICS.FTS$INDEX_TERMS
procedure returns the terms (words) contained in the index.
This procedure is convenient for debugging a full-text index.
PROCEDURE FTS$INDEX_TERMS (
FTS$INDEX_NAME VARCHAR(63) CHARACTER SET UTF8 NOT NULL
)
RETURNS (
FTS$FIELD_NAME VARCHAR(63) CHARACTER SET UTF8,
FTS$TERM VARCHAR(8191) CHARACTER SET UTF8,
FTS$DOC_FREQ INTEGER
);
Input parameters:
Output parameters: