Norconex / committer-sql

Implementation of Norconex Committer for SQL (JDBC) databases.
https://opensource.norconex.com/committers/sql/
Apache License 2.0
1 stars 6 forks source link

Table structure is not suitable for MySQL #1

Closed wolverline closed 6 years ago

wolverline commented 6 years ago

In MySQL, CLOB data type is not supported. And the most field types are declared as VARCHAR(32672) but this itself exceeds the limit if a field if encoded by utf-8, etc. And the combined limit is 65532 in text length; can't create more than 3 fields with other encondings. In order to remove this limit, the fields should be defined as a TEXT/BLOB type.

Hope there is a DB/Field Type option/param or more universal support.

essiembre commented 6 years ago

You can control how tables and field are defined by creating the table yourself beforehand, or providing an SQL to do so with the createTableSQL option. Something like this (not tested):

<createTableSQL>
    CREATE TABLE my_table ( 
        id TEXT(65535) PRIMARY KEY,
        content LONGTEXT,
        numberExample INT(11)
    );
</createTableSQL> 

If you know some of the values you are sending will always be formatted properly to fit a specific type, you can define it like the numberExample above.

Let me know if that works for you. Else, we can make this a feature request to be able to control at a finer level the default field type(s).

wolverline commented 6 years ago

Thanks for the prompt response. Yes, createTableSQL may work but I don't know how many number of fields have to be created; this seems to be dependent upon the number of meta tags (by looking at the source code but I may be wrong). I don't have the meta tag reference unless I set up a committer that preserves files in the local folder. Another kink is MySql doesn't allow dashes as its field name; it does but one has to be foiled by ` (grave accent mark); so the current alter table statement doesn't work with MySql.

essiembre commented 6 years ago

If you plan to keep all fields that are discovered, then you are right, my suggestion will not work. I will mark this ticket as a feature request.

Maybe one thing you can do in the meantime is to try to get a grasp of the fields you want to keep, then use a mix of RenameTagger, KeepOnlyTagger and the createTableSQL to make sure you get what you can.

essiembre commented 6 years ago

A snapshot release of the SQL Committer was just made that now allows you to define a default SQL for creating table fields for new document fields.

Have a look at the documentation and look for createTableSQL and createFieldSQL configuration options for usage example.

It would be nice if you could put that version to the test and confirm.

essiembre commented 6 years ago

SQL Committer 2.0.0 is now released and eliminates this issue.