ankane / dexter

The automatic indexer for Postgres
MIT License
1.9k stars 47 forks source link

Dexter skips some statements and also stops silently and doesn't finish processing file #22

Closed jfinzel closed 6 years ago

jfinzel commented 6 years ago

First of all, I notice the following statement is just ignored - note that parameter 7 is a massively long string with thousands of characters. I would expect fake field f15 to be indexed for this statement:

2017-12-29 02:41:54.091 CST,"foo","db_name",97760,"1.1.1.1:40000",5a45ffbf.17de0,1000,"UPDATE",2017-12-29 02:41:35 CST,35/92068,1268659575,LOG,00000,"duration: 3.781 ms  execute S_71: UPDATE foo
SET f1 = $1
,   f2 = $2
,   f3 = $3
,   f4 = $4
,   f5 = $5
,   f6 = $6
,   f7 = $7
,   f8 = $8
,   f9 = $9
,   f10 = $10
,   f11 = $11
,   f12 = $12
,   f13 = $13
,   f14 = $14
WHERE  ( ( f15 = $15  ) ) ","parameters: $1 = NULL, $2 = NULL, $3 = '2016-11-29 08:00:01.386245+00', $4 = 'Y', $5 = 'f', $6 = NULL, $7 = '\x31303120303432....', $8 = '2016-11-30', $9 = '2016-11-30_FOO.txt', $10 = '2017-12-29', $11 = '2017-12-29 08:41:35.848', $12 = '4', $13 = '100', $14 = '5', $15 = '9302'",,,,,,,,""

Also, Dexter appears to be stopping and not going through my whole log file, but there is not any indication of error or anything. This is a big log file with 105000 lines. After it created an index for me, it just exited, without nearly finishing the log file.

I also just generally noticed a number of missing queries. It said it found 28 query fingerprints, but there were many I found that never appear in the dexter log output.

ankane commented 6 years ago

Hey @jfinzel, the regex was looking for execute <unnamed> instead of execute S_71. That's been fixed now, and likely the reason some queries are missing. A few questions about the exiting early:

  1. Are you running in streaming mode (piping to it)? Does what happens if you do dexter <connection-options> <file1>?
  2. What is the exit code the command returns when it exits early?

Thanks again for helping debug.

jfinzel commented 6 years ago

@ankane I am already doing dexter <connection-options> <file1> to run this.

It appears that it is exit code 0, meaning it's perhaps not recognizing or notifying an error here? After the script exits:

foo@foo:~$ echo $?
0
ankane commented 6 years ago

If that's the case, I don't think the script is exiting early. Index suggestions/creating indexes are only performed once the entire log file(s) have been processed in non-streaming mode. It may just be an issue with the query collection process, which hopefully the latest package fixes.

jfinzel commented 6 years ago

I think you are right. It was skipping most of the statements due to the <unnamed> issue.

I am now getting lots of indexes created - thanks for getting this to the point of more usability. I will let you know if I see any more issues!

ankane commented 6 years ago

Great, glad we were able to make significant improvements. Curious to hear how the index suggestions turn out.