google-code-export / h2database

Automatically exported from code.google.com/p/h2database
0 stars 1 forks source link

MySQL style escaping support for newline, double quote, etc #245

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
While trying to transition from MySQL to H2 I'm running into escaping problems.

My initial report (issue 242 - single quote escaping \' ) has a documented 
solution: '' so its unnecessary to implement \' support.

However, now I have the problem of handling \n, \", \t for which there do not 
appear to be alternatives.

Using a PreparedStatement, as suggested back in the archives of the Google 
Group, is a route I'd rather not take.  I have a process worked out for 
dumping->filtering->and RunScript-ing which works nicely for my data.

I see that the H2 backup tool inserts DECODESTRING() around exported strings 
that need to be parsed this way - unfortunately that would be too complicated 
to inject around the mysqldumped fields.

Would you consider implementing alternatives to those escape sequences?

If not, can you point me to the class/method I need to modify so I can hack 
that for myself?

Original issue reported on code.google.com by James.Mo...@gmail.com on 27 Oct 2010 at 6:59

GoogleCodeExporter commented 9 years ago
The following patch results in an automatic DECODESTRING() which satisfies my 
needs.
I am unsure about your design and security considerations which is why I added 
the Admin Role and MySQL mode checks.  This code works for me, though it may 
not be the most efficient.

Parser.java

private void read() {
2813: case CHAR_STRING: {
...
2831:   currentToken = "'";
2832:   checkLiterals(true);
2833:   boolean isMySQLMode = database.getMode().getName().equals("MySQL");
2834:   boolean isAdmin = session.getUser().isAdmin();
2833:   if (isMySQLMode && isAdmin)
2833:       result = result
2833:               .replace("\\r\\n", "\n")
2833:               .replace("\\n", "\n")
2833:               .replace("\\\"", "\"")
2833:               .replace("\\\t", "\t");
2833:   currentValue = ValueString.get(StringUtils.fromCacheOrNew(result));
2833:   parseIndex = i;
2833:   currentTokenType = VALUE;
2833:       return;
2833: }
...
}

Original comment by James.Mo...@gmail.com on 27 Oct 2010 at 8:25

GoogleCodeExporter commented 9 years ago
Instead of H2 supporting the non-standard MySQL syntax, what about configuring
MySQL to use the ANSI SQL syntax? This would work for all databases:

http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html#sqlmode_no_backslash
_escapes

Original comment by thomas.t...@gmail.com on 28 Oct 2010 at 6:42

GoogleCodeExporter commented 9 years ago
My mysqldump'd data looks the same regardless of this setting.
Before & After:
('2000', 'this is \n a test of \n multiple lines')

Maybe I'm not setting it properly.

What is the ANSI SQL standard for expressing newline within a string?

Original comment by James.Mo...@gmail.com on 28 Oct 2010 at 7:27

GoogleCodeExporter commented 9 years ago
I see... Using --sql-mode="NO_BACKSLASH_ESCAPES" only affects inserting data,
it doesn't affect mysqldump (mysqldump actually disables sql-mode temporarily).

What about using another tool to extract the data from MySQL? 
I know H2 doesn't currently support a data transfer tool, but maybe I should 
write one.
There are other tools, for example the "SQuirreL DB Copy Plugin"
or DbVisualizer: http://h2database.com/html/links.html

Original comment by thomas.t...@gmail.com on 28 Oct 2010 at 8:03

GoogleCodeExporter commented 9 years ago
I'm 95% converted using SQL as the (de)serialization format.  If I hack H2 with 
the above patch I get to 100% conversion.  I looked around for details in ANSI 
SQL newline escaping and I couldn't find anything.  Seems like a giant hole in 
the spec if its not considered.

I don't really want to use another tool to do table copies.  Personally, I'm 
not a fan of Squirrel - and at present I can't get it to work with with H2 
anyway:

org.h2.jdbc.JdbcSQLException: Schema "mydatabase" not found; SQL statement:
select tbl.* from "mydatabase"."public".test_table" tbl [90079-144]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:327)

If you don't want to allow the type of string replacement I coded above for 
RunScript, what about a tool like Script/RunScript which takes a script file 
and outputs a script file in H2-formatted SQL?  The same code change to the 
parser would be necessary, but the flag would only be set by the ConvertScript 
tool.

So for my previous example... an input of
('2000', 'this is \n a test of \n multiple lines')

Would be converted to....
('2000', DECODESTRING('this is \n a test of \n multiple lines'))

The converted script would be saved locally and could then be executed by a 
call to RunScript.

BTW, if I had a vote to cast for what aspect of H2 to build upon, I'd vote for 
enhancing the Console web app to be more like phpMyAdmin.  I remember reading 
that on your roadmap and I think its a terrific idea; phpMyAdmin is a brilliant 
admin tool.

Original comment by James.Mo...@gmail.com on 29 Oct 2010 at 12:43

GoogleCodeExporter commented 9 years ago
You are right, I will try to support the MySQL escape sequence somehow. 
However, I'm not sure yet when I will have time to implement it. There are more 
things to de-escape: http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html

I think it should be a separate setting and not always used if the MySQL mode 
is used (because you can disable it in MySQL).

Original comment by thomas.t...@gmail.com on 29 Oct 2010 at 7:31

GoogleCodeExporter commented 9 years ago
Patches are welcome of course! Please include some test cases as well, and see 
http://h2database.com/html/build.html#providing_patches

Original comment by thomas.t...@gmail.com on 29 Oct 2010 at 7:32

GoogleCodeExporter commented 9 years ago

Original comment by thomas.t...@gmail.com on 21 Nov 2010 at 5:53

GoogleCodeExporter commented 9 years ago
This is a tricky thing to handle.  I still haven't committed to switching to 
H2, but I have adjusted my process a bit and I wanted to share this in case 
someone may benefit.

My solution has two steps and 1 cheat.

Part 1: I filter the MySQL dump script 

mysqldump --compatible=ansi --compact --extended-insert

public static void main(String[] args) {
        try {
            File inFile = new File(args[0]);
            File outFile = new File(args[1]);
            BufferedReader in = new BufferedReader(new FileReader(inFile));
            if (outFile.exists())
                outFile.delete();
            RandomAccessFile out = new RandomAccessFile(outFile, "rw");
            String line = null;
            while ((line = in.readLine()) != null) {
                // Drop SET commands
                if (line.startsWith("SET"))
                    continue;
                // Escape '\' to '''
                line = line.replace("'\\''", "'''");
                // Escape \' to '' UNLESS preceded by a :
                line = line.replace("\\'", "''");
                // Preserve \", drop other quotes
                line = line.replaceAll("(?<!\\\\)\"", "");
                out.write(line.getBytes(Charset.forName("UTF-8")));
                out.write('\n');
            }
            in.close();
            out.close();
        } catch (Throwable t) {
            t.printStackTrace();
        }

Part 2: Patch H2 with the following (two new cases not mentioned above):

boolean isMySQLMode = database.getMode().getName().equals("MySQL");
boolean isAdmin = session.getUser().isAdmin();
if (isMySQLMode && isAdmin)
     result = result
            .replace("\\r\\n", "\n")
            .replace("\\n", "\n")
            .replace("\\r", "\r")
            .replace("\\\"", "\"")
            .replace("\\\t", "\t")
            .replace("\\\\", "\\");

Cheat:
There were some instances of trailing slashes in my text columns like "Moved 
the snapshot to d:\"

I had a hard time writing the last RegExp in part 1 that would properly account 
for this example.  I ended up manually changing the 4 or 5 cases like this in 
the MySQL database to circumvent this issue.

Original comment by James.Mo...@gmail.com on 14 Dec 2010 at 9:11

GoogleCodeExporter commented 9 years ago
I just ran into this issue because Marauroa is escaping backslashes as \\. If 
H2 is used, saved JSON data cannot be parsed anymore.

It seems that backslash quoting is pretty common: For MySQL, Postgres, Informix 
and IIRC Access) backslashes need to be escaped.

Original comment by HendrikU...@nexgo.de on 2 Jun 2012 at 7:38

GoogleCodeExporter commented 9 years ago
> It seems that backslash quoting is pretty common: For MySQL, Postgres, 
Informix and IIRC Access) backslashes need to be escaped.

Backslash-quoting is not compatible with ANSI SQL (all versions of the 
standard). I know older versions of some databases did support it.

However, I believe more recent versions of MySQL and PostgreSQL don't use the 
backslash-quoting. And I think it doesn't make sense to provide a compatibility 
feature for _old_ versions of other databases.

Therefore, I'm closing this issue as won't fix.

Original comment by thomas.t...@gmail.com on 28 Jul 2013 at 9:42