Using parsestream blocks until fully read

andialbrecht / sqlparse

A non-validating SQL parser module for Python

BSD 3-Clause "New" or "Revised" License

3.73k stars 695 forks source link

Using parsestream blocks until fully read #162

Open resamsel opened 9 years ago

resamsel commented 9 years ago

I was trying to combine sqlparse.parsestream with reading from stdin, but could not get it to work properly. Here's my approach:

stream.py:

import sys
import os
import sqlparse

#newin = os.fdopen(sys.stdin.fileno(), 'r', 1)

s = sqlparse.parsestream(sys.stdin)

for i in s:
    print i

Calling it with:

for i in `seq 1 5`; do echo select $i\;; sleep 1; done | python stream.py

The output becomes available after 5 seconds. I was expecting parsestream to output each statement as soon as it is read from stdin. Also, I tried to create a separate newin that uses a buffersize of 1.

Is there anything I'm doing wrong here?

I'm in OS X Yosemite with Python 2.7.9.

andialbrecht commented 9 years ago

That's an interesting use case! ATM parsestream relies on EOF and is intended to work with patterns like "echo 'select * from foo' | stream.py". Are you trying to format log files in a tail-like manner?

For now parsestream doesn't try to deteremine the end of an statement at all (albeit it might be possible). This would be required for that use case then.

resamsel commented 9 years ago

Thank's for you reply! I'm having a large file with lots of SQL statements (70K+ lines) that I want to start as soon as the first statement has been read successfully. At the same time I want to be able to create SQL statements with a script and pipe it to my stream.py.

I already use sqlparse to read SQL statements from a file/stdin and execute them (see https://github.com/resamsel/dbnavigator/blob/master/src/dbnav/executer/__init__.py if you're interested in what I'm currently working on). I'm trying to replace the read_statements function with a stream, to work around the long time it takes to read and parse all of those 70K+ lines...

mehaase commented 9 years ago

+1 I'm interested in reading a database dump (.sql) and being able to extract information from Python without needing to actually stage it in a real database. I'm working with files > 100k lines. The performance is okay so far, but as I scale up, it would be nice if the generator could start returning results without needing to read the entire file into memory.

c3c commented 9 years ago

+1 i would like to use this library for parsing large sql dump files

vmuriart commented 8 years ago

will anyone be able to provide an example and test for this? I think I see what's going on, but would like to validate

resamsel commented 8 years ago

@vmuriart, what's wrong with the example and test in the original post? I'd be happy to provide more information, if you need any.

vmuriart commented 8 years ago

I'm on a Windows computer :sob:. Specifically though, I was looking for something that could be added to the libraries tests

mehaase commented 8 years ago

Here are some free SQL dumps found via Google. Some are quite large.

http://dev.mysql.com/doc/index-other.html http://sportsdb.org/sd/samples More: https://www.google.com/search?safe=off&q=filetype%3Asql

vmuriart commented 8 years ago

Thanks @mehaase. I meant an example that can be included on the tests module. I am having a hard time coming up with a way to automate this test.

abudis commented 2 years ago

Sorry for reviving an old issue, but it would be great if something similar (exactly the same?) as what @Toilal has added in the fork was added upstream!