Open MarechJ opened 5 years ago
I think we should keep this open to track doing single-pass parameter substitution. We still shouldn't have double substitution if people use mixed parameter styles.
I have run into this bug yesterday and spent non-trivial amount of time diagnosing it. I got a bit irritated too. I found this issue only after fully analysing underlying problem. I consider developing a fix for this. However it seems there is no direct clear approach to fixing this. I'd like to discuss preferred solution.
I believe it's correct to always use single paramstyle at most. Sadly current implementation does not require user to select one. And changing that would broke user's code. Also obviously we cannot to just set one as existing code can expect different one.
Another option is to try and guess expected paramstyle if not explicitly set. For example use the one that first matches number of params with number of markers. This seem to be most promising solution as it requires no change in user's code. But doesn't support combination of markers of different paramstyles (which is currently theoretically possible). Combination of different paramstyles is quirky at best as current implementation matches params from the start for each (non-numeric) paramstyle anyway. So I cannot really think about this use-case as supported.
Final option is to try to replace all markers using different paramstyles in single pass. This may seem as best approach. But I think it is flawed. The main problem is to check number of parameters with number of markers. Because combination of numeric paramstyle with any other may result in different number of parameters and markers being correct. Which makes this check very complicated and counter-intuitive.
Bad thing is that implementing any of these options can theoretically break user code, if any depends on such quirks. But I consider current state worse.
@gjask Is the problem relevant if a paramstyle was selected?
I think that generally the solution is to always set paramstyle. My understanding of DB API wiki says that the default of paramstyle is driver dependent, and some drivers may allow setting it to another value. The Impyla behavior of doing all possible substitutions by default seems like an Impala specific convenience feature that can lead to weird bugs.
My preferred long term solution would be to throw an error if no paramstyle is set. While this could break some existing programs, fixing it is very easy and I don't think that mixed substitution is something that anyone should rely on.
@csringhofer No, the problem occurs only when is is not selected.
I am not sure if it is possible to require user to always pass a paramstyle. I guess that would violate DB API spec. I think the most correct solution is to set default paramstyle and then allow user to specify different one if needed.
I am not sure if you noticed, but I already sent PR https://github.com/cloudera/impyla/pull/508 implementing single-pass param substitution. It aims to provide same functionality as before just without bugs caused by multiple passes during substitution. As that shouldn't brake any existing code (unless it deliberately depends on broken behaviour). I thought you would want to go with least amount of breaking changes. In the end I even kinda improved matching number of markers with number of parameters. It should be good enough now.
I am not sure if it is possible to require user to always pass a paramstyle. I guess that would violate DB API spec. You are right, the way Impyla expects paramstyle would violate DB API 2.0 spec. In DB API 2.0 the assumption is that a driver supports one paramstyle, set in the global variable paramstyle. I saw some drivers where this can be overridden when establishing the connection.
I found a wiki about a proposal for DB API 3.0 where paramstyle could be modified after connection/cursor construction: https://wiki.python.org/moin/DbApi3#Preliminary_Consensus
I think that the solution above would be the ideal way (the user can use any paramstyle, but has to set it explicitly if it is not the default), but in DB API 2.0 Impyla should stick to the current way.
Thanks for the fix https://github.com/cloudera/impyla/pull/508 !
Thank you for merging my PR. How long it usually takes to get that released? I would like to remove workaround on our side.
Is it enough if it is released as an alpha release, e.g. 0.19a1? That would mean that you can get it with "pip install impyla==0.19a1" but "pip install impyla" will still install 0.18.0
Ok, that probably would be good enough.
When using a list for passing parameters and one of the values contains a
:\d+
the value itself gets substituted. The bug applies to all paramstyles. See ipython logs below. I fixed it in: #348This is the same as #317