google-code-export / roundhouse

Automatically exported from code.google.com/p/roundhouse
1 stars 0 forks source link

GO-splitter regex doesn't work in all situations #25

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When I updated to the latest version our scripts didn't run anymore. The
problem was the new regex to split the GO statements.

Here is the the problem:
-- Some comment
GO

It doesn't split on the GO. I changed the regex a bit an you can find the
new version attached. But it isn't 100% correct either, some tests fail
even the one I added to prove my case fails. Probably because I'm doing
something wrong in MbUnit. It does work in Regulator, wich I use to test
the regex.

Old:
(?<!--[\s\t\w\=\'\.\>\?\<\(\)\[\]\-\*\;]*|\w)(GO)[\t\s]*$(?![\t\s\w]*\*\/)

New:
(?<!--[\s\w\=\'\.\>\?\<\(\)\[\]\-\*\;])*^\s*GO\s*$(?![\s\w]*\*\/)

Let me explain what I changed:
[\t\s] => \s, this is the same
^ : before the GO
removed |\w because \w was already in the other alternation

The only problem I still have is that BOB GO doesn't split. In your tests
it should, but maybe it shouldn't and the guideline could be to always
place the GO on a new line.

Original issue reported on code.google.com by jochen.j...@gmail.com on 22 Feb 2010 at 2:49

Attachments:

GoogleCodeExporter commented 9 years ago
The only problem with the rule is that it's something people will have to 
consider 
when using RH and not something they have to consider when using T-SQL normally.

Let me see what I can crank out.

Original comment by trueblu...@gmail.com on 22 Feb 2010 at 7:16

GoogleCodeExporter commented 9 years ago
I hope you find a solution. I have been looking at it all day. Regex is good, 
if it
stays simple. ;-)

Btw when you remove the () around the GO the split removes the GO from the 
splitted
queries. Then you don't need the Regex.Replace in the other function.

Original comment by jochen.j...@gmail.com on 22 Feb 2010 at 7:34

GoogleCodeExporter commented 9 years ago
I'm down to just one guy here

--
GO

Everything else works.

Original comment by trueblu...@gmail.com on 22 Feb 2010 at 8:15

GoogleCodeExporter commented 9 years ago
Got it....

(?<!--[\s\w\=\'\.\>\?\<\(\)\[\]\-\*\;]*[^\r\n\f]|\w|--)[\s]*GO[\s]*$(?!
[\s\w\=\'\.\>\?\<\(\)\[\]\-\*\;\r\n\f]*\*\/)

Original comment by trueblu...@gmail.com on 22 Feb 2010 at 8:28

GoogleCodeExporter commented 9 years ago
Fixed in revision 127.

Original comment by trueblu...@gmail.com on 22 Feb 2010 at 8:43

GoogleCodeExporter commented 9 years ago
Revision 131 has this: 
(?<!--[\s\w\`\~\!\@\#\$\%\^\&\*\(\)\-_\+\=\,\.\;\:\'\""\[\]
\\\/\?\<\>]*[^\r\n\f]|\w|--)[\s]*GO[\s]*$(?![\s\w\`\~\!\@\#\$\%\^\&\*\(\)\-
_\+\=\,\.\;\:\'\""\[\]\\\/\?\<\>\r\n\f]*\*\/)

Tests were updated as well.

Original comment by trueblu...@gmail.com on 23 Feb 2010 at 5:12

GoogleCodeExporter commented 9 years ago
The new regex broke didn't work on our scripts. There is no split at all.
For the moment I reverted back to the previous one to continue the work on 
Oracle.

Original comment by jochen.j...@gmail.com on 24 Feb 2010 at 10:59

GoogleCodeExporter commented 9 years ago
Can you put some tests in for specifically what is breaking it? That way we can 
address those areas.

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 4:18

GoogleCodeExporter commented 9 years ago
I've just committed a test to the oracle-support branch. The problem is the 
split
when multiple GO's are in one file. But it the problem is bigger then the one 
test I
added. We have tried to split the regex in multiple parts, one for the -- 
comments
and one for the /* */ comments but in some cases it still doesn't work. Just 
below
the test you can find what we had so far.

Here is an example I was testing with in Regulator:

BOB1
GO

/* COMMENT */
BOB2
GO

-- GO

BOB3 GO

--`~!@#$%^&*()-_+=,.;:'"[]\/?<> GO

BOB5
   GO

BOB6
GO

/* GO */

BOB7

/* 

GO

*/

BOB8

--
GO

BOB9

-- `~!@#$%^&*()-_+=,.;:'"[]\/?<>
GO

BOB10GO

BOB11

-- dfgjhdfgdjkgk dfgdfg GO
BOB12

Original comment by jochen.j...@gmail.com on 24 Feb 2010 at 4:35

GoogleCodeExporter commented 9 years ago
In regulator, make sure you are running multiline.

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 4:44

GoogleCodeExporter commented 9 years ago
This is doing a little better: (?<!--[\s\`\~\!\@\#\$\%\^\&\*\(\)\-
_\+\=\,\.\;\:\'\""\[\]
\\\/\?\<\>]*[^\r\n\f]|\w|--)[\s]*GO[\s]*$(?![\s.\r\n\f]*\*\/)

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 4:53

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
This is the only one I am seeing issues with in Expresso:

-- dfgjhdfgdjkgk dfgdfg GO

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 4:54

GoogleCodeExporter commented 9 years ago
(?<!^-{2}[\s\w\d\`\~\!\@\#\$\%\^\&\*\(\)\-_\+\=\,\.\;\:\'\""\[\]\\\/\?\<\>]*[^\r
\n\f]
|\w|^-{2})[\s]*GO[\s]*$(?![\s.\r\n\f]*\*\/)

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 5:07

GoogleCodeExporter commented 9 years ago
(?<!-{2}[\s\w\`\~\!\@\#\$\%\^\&\*\(\)\-_\+\=\,\.\;\:\'\""\[\]\\\/\?\<\>]*[^\r\n\
f]
|\w|-{2})[\s]*GO[\s]*$(?![\s.\r\n\f]*\*\/)

Still an issue not finding the second go below:

-- GO
BOB3 GO

Original comment by trueblu...@gmail.com on 24 Feb 2010 at 5:13

GoogleCodeExporter commented 9 years ago
Maybe the regex is to complex to do in one split. On Stackoverflow I found 
something
that could help.

http://stackoverflow.com/questions/778969/regex-for-parsing-sql-statements

Here the regex tries to match comments, quoted string and GO. Because the
regex-engine is eager it will not find a GO inside comments because they are 
before
the GO in the regex.

This does mean the a simple split isn't possible, but the function will need to 
find
all matches from the GO group.

I'm almost certain this will work, but for the moment we will revert back to a 
simple
regex to get our scripts working again because Oracle is giving us serious 
headaches
too. We think this is more important at the moment than a split that covers all
possible scenarios.

Original comment by jochen.j...@gmail.com on 25 Feb 2010 at 12:19

GoogleCodeExporter commented 9 years ago
For now we've gone back to a simple regex in the Oracle_Support branch: 
^[\s]*GO[\s\n\r\f]*$ for Ado.Net with SQL Server
^[\s]*;[\s\n\r\f]*$ for Ado.Net with Oracle

This introduces a convention that splitter statements should be on their own 
line, 
without comments. So we put some UnitTests on Ignore.

Shouldn' it be better for maintenance reasons to have a simple regex and a 
convention, than to have a very complex regex? 

After all if the convention is not followed, the build will fail, and the team 
will 
know what they've done wrong.

There will be scenario's we have not covered, and the complex Regex will need 
to 
change again...

Original comment by Pascal.M...@gmail.com on 25 Feb 2010 at 12:55

GoogleCodeExporter commented 9 years ago
That seems reasonable.

The only thing with ; is that I hardly ever put it on it's own line.

Original comment by trueblu...@gmail.com on 25 Feb 2010 at 2:40

GoogleCodeExporter commented 9 years ago
I'll concentrate on getting the GO regex (or multiple regex splits) figured 
out. 

Original comment by trueblu...@gmail.com on 25 Feb 2010 at 2:48

GoogleCodeExporter commented 9 years ago
I think I have it... it's going to be a bit weird. It's going to be a replace 
and a 
split. The first part is to replace GO with a crazy string. The other part is 
to 
wipe those crazy strings out.

Here's one part of the string now: (?<KEEP1>(?:--).*$|/\*[\S\s]*?\*/)|(?
<KEEP1>'(?:\\.|[^'\\])*')|(?<KEEP1>"(?:\\.|[^"\\])*")|(?<KEEP1>^[^\S\n]*.*)(?<GO
>GO)
(?<KEEP2>[^\S\n]*\d*[^\S\n]*(?:(?:--).*)?$)

with this in for the replace:
${KEEP1}${KEEP2}

I just have to figure out how to find the named match GO and replace it.

Original comment by trueblu...@gmail.com on 25 Feb 2010 at 5:06

GoogleCodeExporter commented 9 years ago
I think the splitter is good now. Pull revision 143 and give it a shot. Let me 
know 
what you find.

Original comment by trueblu...@gmail.com on 2 Mar 2010 at 5:47

GoogleCodeExporter commented 9 years ago
We've pulled revision 143 and tested it against our scripts.

We have scripts like this, and it's split incorrect on the 'go' from 'good' 
which is 
on a comment line:

  CREATE TABLE MyTable (
    ID  int IDENTITY(1,1) NOT NULL,
    -- TODO: To be good, there should be type column
    Description             varchar(200) NOT NULL
  )

Personally I find the long regex complex, and we lost a lost of time trying to 
understand & fix it. 

However, I see another (other than comment 17 above) alternative solution that 
might 
help fixing the splitter. 

- First replace all comments (regex (--(.)*?$)|(/\*(.)*?\*/)) by String.Empty)
- Then split on the seperator 

This would result in 2 small regexes, but the consequence is that comments are 
completely ignored.

Original comment by Pascal.M...@gmail.com on 3 Mar 2010 at 1:46

GoogleCodeExporter commented 9 years ago
To capture all kinds of input, it calls for a complex regex because you are 
adapting 
to most every type of input. 

I'm not really looking for you guys to spend time understanding it, more just 
tell 
me if it fails and what specifically it is failing on. That's where I add a 
test and 
fix it and all of the other alternatives.

Original comment by trueblu...@gmail.com on 3 Mar 2010 at 8:28

GoogleCodeExporter commented 9 years ago
check revision 145.

Original comment by trueblu...@gmail.com on 3 Mar 2010 at 8:30

GoogleCodeExporter commented 9 years ago
The GO on a comment line is fixed now. :-)

The next issue is data in insert statements, f.e.:

INSERT INTO MyTable(ID, Description) VALUES (1, 'Genital Tract: Man and Woman; 
Gonorrhea')

It is splitting on the 'Go' from 'Gonorrhea', which is incorrect. Sorry for the 
medical terminology.

Original comment by Pascal.M...@gmail.com on 4 Mar 2010 at 8:27

GoogleCodeExporter commented 9 years ago
Same will count for WHERE conditions in UPDATE and DELETE statements.

Original comment by Pascal.M...@gmail.com on 4 Mar 2010 at 8:46

GoogleCodeExporter commented 9 years ago
next revision. :D

Original comment by trueblu...@gmail.com on 4 Mar 2010 at 4:36

GoogleCodeExporter commented 9 years ago
Failing now on following script with following message: Incorrect syntax near 
the
 keyword 'PRINT'. It's the 2nd Print in the script that's causing it.

PRINT N' -> Tabel: MyTable'
CREATE TABLE MyTable ( 
    ID  int NOT NULL,
    Remarks ntext NULL 
    )
GO
PRINT N' -> Constraints: MyTable'
ALTER TABLE MyTable
    ADD CONSTRAINT PK_MyTable
    PRIMARY KEY (ID)

Original comment by Pascal.M...@gmail.com on 4 Mar 2010 at 4:54

GoogleCodeExporter commented 9 years ago
Okay guys... revision 147. Let me know what you find. I was able to simplify 
the 
regex and I'm hoping this time its for keeps! :D

Original comment by trueblu...@gmail.com on 6 Mar 2010 at 6:18

GoogleCodeExporter commented 9 years ago
It's doing a lot better! 

But we've seen 2 issues.  

1. Wrong split when a sql script ends on a GO statement, which is quite common. 
(see 
attached sql file)

2. Performance of really really large scripts is not good.

In our spike, where we test new versions of RoundhousE, we have a very large 
(25Mb) 
script containing a lot of initial data. The script was made by a tool SQL 
Compare, 
and performed fine in earlier versions of RoundhousE. It ran in <1 minute. Now 
it's 
taking >15 minutes. We also have a script of 3Mb and that one runs fine. This 
is 
probably NOT a show stopper for RoundhousE, because people shouldn't make 
scripts 
THAT big, right? :-)

Original comment by Pascal.M...@gmail.com on 8 Mar 2010 at 10:05

Attachments:

GoogleCodeExporter commented 9 years ago
RIIIIIIIIGGGHHHHTTTTttt! :D lol

1. I was thinking of this. The last statement with a go can be overcome with 
either 
a space or we can just search for that specifically.

2. When you guys were doing these tests, was this prior to splitting GOs? 

Original comment by trueblu...@gmail.com on 8 Mar 2010 at 4:33

GoogleCodeExporter commented 9 years ago
1. We prefer a specific search, but the space rule is also fine if it's to hard 
to 
accomplish.

2. Well there were almost no GO statements in this script, just a lots of 
inserts into 
lots of tables. We didn't search on why it was slow. We've split the script 
into 
smaller scripts and it's running fine now.

Original comment by Pascal.M...@gmail.com on 8 Mar 2010 at 4:48

GoogleCodeExporter commented 9 years ago
I think I have #1 fixed

Original comment by trueblu...@gmail.com on 8 Mar 2010 at 4:57

GoogleCodeExporter commented 9 years ago
Revision 150. 

As far as performance. I had a huge file (15MB) that I was running when I first 
started splitting on a batch terminator for Access. It was a whole bunch of 
update 
statements (180,000+). We let it run for about two hours before we finally 
killed 
it. At the time we assumed it was something with Access. 

I'm thinking it's possibly a combination of RH and the database. We may want to 
explore string builder here instead of immutable strings. Let's log a different 
issue on that.

Original comment by trueblu...@gmail.com on 8 Mar 2010 at 6:23

GoogleCodeExporter commented 9 years ago
I'll create it.

Original comment by trueblu...@gmail.com on 8 Mar 2010 at 6:25

GoogleCodeExporter commented 9 years ago
#1 is indeed fixed

Original comment by Pascal.M...@gmail.com on 8 Mar 2010 at 7:04

GoogleCodeExporter commented 9 years ago
Does this mean we can finally put this issue to bed?!!!

Original comment by trueblu...@gmail.com on 8 Mar 2010 at 10:06

GoogleCodeExporter commented 9 years ago
Yesss!

Original comment by Pascal.M...@gmail.com on 11 Mar 2010 at 8:26

GoogleCodeExporter commented 9 years ago
Closed, fixed and verified at revision 150.

Original comment by trueblu...@gmail.com on 11 Mar 2010 at 3:24