andialbrecht / sqlparse

A non-validating SQL parser module for Python
BSD 3-Clause "New" or "Revised" License
3.63k stars 685 forks source link

Add Chinese, Japanese, Korean Name support #734

Open LightWind1 opened 10 months ago

LightWind1 commented 10 months ago

376

I add three regular expression to match Chinese, Japanese, Korean words . Now it can tokenize sql correctly like 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'

andialbrecht commented 3 months ago

Hi @LightWind1, can you clarify what problem your change solves? I've had a look on how the parser sees your statement and to me everything looks as expected:

import sqlparse
sql = 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'
p = sqlparse.parse(sql)[0]
p._pprint_tree()
|- 0 DML 'select'
|- 1 Whitespace ' '
|- 2 IdentifierList 'T2.名称 ...'
|  |- 0 Identifier 'T2.名称'
|  |  |- 0 Name 'T2'
|  |  |- 1 Punctuation '.'
|  |  `- 2 Name '名称'
|  |- 1 Whitespace ' '
|  |- 2 Punctuation ','
.....and so on.....