Open ChaosEternal opened 11 years ago
I think I don't understand... column number can be fetch from the position of the identifier inside the IdentifiersList and this number would change if you add or remove them (but it's true that would be able to be added a property to get it easier), but row? What do you want to say with "row"? The order of a statement inside a StatementsList?
sorry, let me explain it.
the column number is the position of the token from beginning of it's line. the row number is line number of the token from begining of file.
think about thousands of sql queries, each has thousands of lines, and a tool is wanted to rewrite some parts of the queries and leave other parts intact.
I still don't get the catch of the rows and columns, as I told you this number can change, and also sqlparse allow to change specific fragments of the SQL queries (and in fact is better to do it in a relatively form). Maybe the functionality that you are requesting correspond to a higher level library?
@andialbrecht, can you be able to give us light on this topic? :-)
sorry for disturb , and I wrote some dirty code to explain this. look attachement
the 2.sql is as following:
WITH ATBL AS (SELECT
sed -e "python convwith2temptable2.py 2.sql |sed 's/$/;/'
" 2.sql
it will give us this:
WITH DROP TABLE IF EXISTS ATBL; CREATE TABLE ATBL AS (SELECT
Thanks.
On Tue, Dec 4, 2012 at 4:19 PM, Jesús Leganés Combarro < notifications@github.com> wrote:
I still don't get the catch of the rows and columns, as I told you this number can change, and also sqlparse allow to change specific fragments of the SQL queries (and in fact is better to do it in a relatively form). Maybe the functionality that you are requesting correspond to a higher level library?
@andialbrecht https://github.com/andialbrecht, can you be able to give us light on this topic? :-)
— Reply to this email directly or view it on GitHubhttps://github.com/andialbrecht/sqlparse/issues/82#issuecomment-10987939.
as you mentioned, the row number and column number will change, and so the code is unsafe.
I could think of another use case: In CrunchyFrog, the db frontend/SQL editor where sqlparse originated from, I need to know where a statement starts and ends to highlight it in the GUI. So col/row attributes on tokens would be helpful here too. Also when jumping to specific parts within a statement ("go to WHERE clause/JOIN definitions" would be a nice feature in the editor).
But as @piranna already noted, the values in the hypothetical cols/rows attributes would only be valid after initial parsing. During initial parsing all whitespaces are preserved so that the lexer should be able to calculate the initial positions (I'd throw char offset in the ring then too)
But as soon as filters are applied those properties could be invalid and we have no easy way to tell if they're still valid. OTOH, maybe that's just a documentation / naming issue...
See also #102
Throwing another comment here to say I'd like to see this implemented, for a tool I'm working on. It seems pretty basic to have position information in a parser. :)
Pull-request are welcome :-) El 15/2/2016 1:52, "Idan Kamara" notifications@github.com escribió:
Throwing another comment here to say I'd like to see this implemented, for a tool I'm working on. It seems pretty basic to have position information in a parser. :)
— Reply to this email directly or view it on GitHub https://github.com/andialbrecht/sqlparse/issues/82#issuecomment-184015281 .
@idank do you mean absolute position? nth token from the beginning of statement?
I meant absolute position. I ended up with this hack, for those interested:
def calcpositions(root, index=0):
positions = {}
if root.is_group():
startindex = index
for token in root.tokens:
nestedpos, index = calcpositions(token, index)
positions.update(nestedpos)
positions[root] = (startindex, index)
else:
positions[root] = (index, index+len(root.value))
index += len(root.value)
return positions, index
It returns a {token: (start, end)} dictionary and should be called with sqlparse.parse(text)[0]
.
it's around the lines of what I was going to suggest, though I had misunderstood and thought you meant index as position of "whole" token rather than start and end characters.
I modified my would be suggestion but its essentially the same, with the exception of mine skipping the groups and just does the individual tokens.
from __future__ import print_function
import sqlparse
def calcpositions(root, index=0):
positions = {}
if root.is_group():
startindex = index
for token in root.tokens:
nestedpos, index = calcpositions(token, index)
positions.update(nestedpos)
positions[root] = (startindex, index)
else:
positions[root] = (index, index + len(root.value))
index += len(root.value)
return positions, index
def calcpositions2(stmt):
index = 0
positions = {}
for token in stmt.flatten():
positions[token] = (index, index + len(str(token)))
index += len(str(token))
return positions
if __name__ == '__main__':
sql = "CREATE TABLE test();"
stmt = sqlparse.parse(sql)[0]
print(calcpositions(stmt))
print("##############")
print(calcpositions2(stmt))
({<Identifier 'test' at 0x1E9BC10>: (13, 17), <DDL 'CREATE' at 0x1EA0458>: (0, 6), <Whitespace ' ' at 0x1EA04A8>: (6, 7), <Keyword 'TABLE' at 0x1EA04F8>: (7, 12), <Whitespace ' ' at 0x1EA0548>: (12, 13), <Name 'test' at 0x1EA0598>: (13, 17), <Punctuation '(' at 0x1EA05E8>: (17, 18), <Punctuation ')' at 0x1EA0638>: (18, 19), <Punctuation ';' at 0x1EA0688>: (19, 20), <Parenthesis '()' at 0x1E9BB40>: (17, 19), <Statement 'CREATE...' at 0x1E9BBA8>: (0, 20)}, 20)
##############
{<Punctuation ')' at 0x1EA0638>: (18, 19), <DDL 'CREATE' at 0x1EA0458>: (0, 6), <Punctuation ';' at 0x1EA0688>: (19, 20), <Whitespace ' ' at 0x1EA04A8>: (6, 7), <Keyword 'TABLE' at 0x1EA04F8>: (7, 12), <Whitespace ' ' at 0x1EA0548>: (12, 13), <Name 'test' at 0x1EA0598>: (13, 17), <Punctuation '(' at 0x1EA05E8>: (17, 18)}
though nicely compliments the token_at_offeset
function.
Yeah I happen to need position of groups too.
This whole thing falls on its face if the parser decides to suppress parts of the input one day, say whitespace (like most parsers do). It works for now though.
The parser should never suppress whitespaces. IOW during parsing it should be possible to restore the original input. This is required in some use cases.
Funny thing is that the position is already known in Lexer.py. this information gets ignored again. Why not add it to the token instead?
Of course, calculating row & column might make it a bit harder. But it should be reasonably trivial to add in the lexer.
The problem was that once filters are applied to the lexer their row/columns are modified.
At the moment parse
doesn't take any filters that would make this a problem; but in the case it was allowed to take filters (i was planning on adding the capability later) their position would shift.
Thinking about it though, it shouldnt be too complicated to add a component to the filters to update this information before it finishes processing
Depends on the information you want. If what you want (most likely) the position where it originally occurred in the stream/file/string, then doing it in the lexer makes sense.
In reailty, sqlparse can also be used for reformating and other position-changing purposes. In that case, yes, the whole concept of position breaks down in general. Still, I think adding the functionality with that disclaimer makes sense.
I second the need to have (back) a way to get the position of each token. In sqlparse > 0.2 the Lexer class has been simplified (it is currently a mere namespace for a single classmethod) and the tokenize()
function that uses it does not return the position anymore.
One of my tools implements poor man's fake domains on databases missing the domain
concept, recognizing CREATE DOMAIN foo bar
and mechanically replacing foo
with bar
in the statements when this is the case. I had to reimplement the old behaviour to be able to do that.
I also need this for something I'm working on!
IMO, it makes complete sense to provide this for parse, and then simply invalidate the information as and when it becomes invalidated.
Suggestion is add methods like col and row on class Token , which would make life like sql conversion easier.