FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.24k stars 215 forks source link

String literal format parsing enhancement. [CORE3459] #3820

Open firebird-automations opened 13 years ago

firebird-automations commented 13 years ago

Submitted by: @jasonwharton

I am working with data that has control characters in it that makes it so that I cannot use the script component in my components effectively.

I wrote some routines in Delphi that produce strings I can work with in Pascal but unfortuantely the Firebird string parser does not recognize the same format. It would be very helpful if the statement parser were enhanced to allow for this.

For example:

Insert into MyTable values ( 1234, 'Row1'#⁠13#⁠10'Row2' );

This would allow me to have the CRLF inside of a string literal without messing up my script's formatting.

Here is the code in Pascal that handles this, in case it may be of any assistance to whoever would enhance the Firebird statement parser. It is really a very simple algorithm and it should not hurt performance in any significant way. Look at the call with '#⁠' as the value to the Lit: char parameter.

function isLitCriteria( const AStr: string; Lit: char ): Boolean; var tmpLen: integer; tmpStr: string; UsePound: boolean; ii: integer; jj: integer; begin UsePound := Lit = '#⁠'; if UsePound then Lit := ''''; tmpStr := Trim( AStr ); tmpLen := iboLength( tmpStr ); Result := tmpLen > 1; if Result then begin if ( lit = '''' ) or ( lit = '"' ) then Result := (( tmpStr[1] = lit ) and ( tmpStr[ tmpLen ] = lit )) else Result := (( tmpStr[1] = '''' ) and ( tmpStr[ tmpLen ] = '''' )) or (( tmpStr[1] = '"' ) and ( tmpStr[ tmpLen ] = '"' )); if not Result and UsePound then begin Result := true; if tmpStr[ tmpLen ] <> '''' then begin repeat jj := tmpLen; while ( tmpLen >= 1 ) and ( tmpStr[tmpLen] >= '0' ) and ( tmpStr[tmpLen] <= '9' ) do Dec( tmpLen ); if ( tmpLen >= 1 ) and ( tmpStr[tmpLen] = '#⁠' ) then Dec( tmpLen ); if ( tmpLen >= 1 ) and ( tmpStr[tmpLen] = '''' ) then Break; if jj = tmpLen then Result := false; until ( not Result ) or ( tmpLen < 1 ); end; ii := 1; if Result and ( tmpStr[ 1 ] <> '''' ) and ( tmpLen > 0 ) then begin repeat jj := ii; if ( ii <= tmpLen ) and ( tmpStr[ii] = '#⁠' ) then Inc( ii ); while ( ii <= tmpLen ) and ( tmpStr[ii] >= '0' ) and ( tmpStr[ii] <= '9' ) do Inc( ii ); if ( ii >= 1 ) and ( tmpStr[ii] = '''' ) then Break; if jj = ii then Result := false; until ( not Result ) or ( ii = tmpLen ); end; end; end; end;

function mkLitCriteria( const AStr: string; Lit: char ): string; var ii: integer; LitFlag: boolean; UsePound: boolean; begin UsePound := Lit = '#⁠'; if isLitCriteria( AStr, Lit ) then Result := AStr else begin if UsePound then Lit := ''''; Result := ''; LitFlag := true; for ii := 1 to iboLength( AStr ) do begin if AStr[ii] = Lit then begin Result := Result + Lit + Lit; LitFlag := true; end else if ( AStr[ii] < ' ' ) and UsePound then begin if not LitFlag then Result := Result + Lit; Result := Result + '#⁠' + IntToStr( Ord( AStr[ii] )); LitFlag := true; end else begin if LitFlag then Result := Result + Lit; Result := Result + AStr[ii]; LitFlag := false; end; end; if Result = '' then Result := Lit + Lit else if not LitFlag then Result := Result + Lit; end; end;

function stLitCriteria( const AStr: String; Lit: char ): String; var ii: integer; jj: integer; kk: integer; tmpS: string; LitFlag: boolean; UsePound: boolean; LitFound: boolean; begin LitFound := false; UsePound := Lit = '#⁠'; if ( Lit <> '''' ) and ( Lit <> '#⁠' ) and ( Lit <> '"' ) then begin if isLitCriteria( AStr, '#⁠' ) then begin Lit := '#⁠'; LitFound := true; UsePound := true; end else if isLitCriteria( AStr, '''' ) then begin Lit := ''''; LitFound := true; end else if isLitCriteria( AStr, '"' ) then begin Lit := '"'; LitFound := true; end else begin Result := AStr; Exit; end; end; if LitFound or isLitCriteria( AStr, Lit ) then begin if UsePound then Lit := ''''; Result := AStr; LitFlag := true; ii := 1; if Result = Lit + Lit then begin Result := ''; Exit; end else while ii <= iboLength( Result ) do begin if LitFlag then begin if Result[ii] = Lit then begin iboDelete( Result, ii, 1 ); LitFlag := false; end else if ( Result[ii] = '#⁠' ) and UsePound then begin if ( ii > 1 ) and ( Result[ii-1] = '''' ) then begin iboDelete( Result, ii - 1, 1 ); Dec( ii ); end; jj := ii; kk := jj; repeat if Result[ii] = '#⁠' then iboDelete( Result, ii, 1 ) else Break; jj := ii; kk := jj; tmpS := ''; while ( jj <= iboLength( Result )) and ( Result[jj] >= '0' ) and ( Result[jj] <= '9' ) do begin tmpS := tmpS + Result[jj]; Inc( jj ); end; if tmpS <> '' then begin Result[ii] := Char( StrToInt( tmpS )); iboDelete( Result, ii + 1, jj - ii -1 ); end; Inc( ii ); until ( ii = iboLength( Result )) or ( kk = jj ) or ( tmpS = '' ); LitFlag := Result[ii] = ''''; if LitFlag then Dec( ii ); end; end else LitFlag := Result[ii] = Lit; Inc( ii ); end; if LitFlag then iboDelete( Result, ii - 1, 1 ); end else Result := AStr; end;

firebird-automations commented 13 years ago

Commented by: @dyemanov

What's wrong with this: 'Row1' || x'0D0A' || 'Row2' ?

firebird-automations commented 13 years ago

Commented by: @jasonwharton

'Row1' || x'0D0A' || 'Row2' would be a workable solution for my needs.

What I prefer about the method I propose is:

* I don't like to have to concatenate separate pieces. * I like having a string literal that is the same as wha I am used to in Pascal. * I know the control characters by decimal rather than by hex. * It will be easier to work with varying byte code-points using my method because there is a clear delineation between characters with each character starting with #⁠ and then whatever is necessary to represent that character. * My method could do hex as well by using #⁠$nn so it would be more versatile.

And, the beauty of it is firebird could support both methods at the same time.

firebird-automations commented 13 years ago

Commented by: @mrotteveel

I think Firebird should follow the SQL standards and not introduce an escaping mechanism of its own. Especially as the format suggested looks very confusing and outdated to me, and would probably not scale to multibyte charactersets and is most likely hard to parse (not by itself, but in the context of the entire SQL string it probably is).

If we have need for an escape mechanism, I would vote for the Unicode escape mechanism defined in the standard:

<Unicode character string literal> ::= [ <introducer><character set specification> ] U<ampersand><quote> [ <Unicode representation>... ] <quote> [ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ] <Unicode escape specifier>

<Unicode representation> ::= <character representation> | <Unicode escape value>

<Unicode escape specifier> ::= [ UESCAPE <quote><Unicode escape character><quote> ]

<Unicode identifier part> ::= <delimited identifier part> | <Unicode escape value> <Unicode escape value> ::= <Unicode 4 digit escape value> | <Unicode 6 digit escape value> | <Unicode character escape value> <Unicode 4 digit escape value> ::= <Unicode escape character><hexit><hexit><hexit><hexit> <Unicode 6 digit escape value> ::= <Unicode escape character><plus sign> <hexit><hexit><hexit><hexit><hexit><hexit> <Unicode character escape value> ::= <Unicode escape character><Unicode escape character> <Unicode escape character> ::= !! See the Syntax Rules

(copied from SQL 2003 section 5.2)

firebird-automations commented 13 years ago

Commented by: @jasonwharton

Mark's suggestion certainly deserves higher priority than mine. I can see how this would be very beneficial for cross-database syntax compatibility.

firebird-automations commented 12 years ago

Commented by: @jasonwharton

There was a bug in the code above. If you want to use this code please contact me for the most current version.