Segfault-Inc / Multicorn

Data Access Library
https://multicorn.org/
PostgreSQL License
700 stars 145 forks source link

Possible/worth restructuring quals? #103

Closed oscardssmith closed 3 years ago

oscardssmith commented 9 years ago

I am currently considering rewriting the way quals are passed down in multicorn so it is a tree instead of a list. If this approach worked, would it allow or statements to be processed instead of the current behavior which causes no quals to be passed. Would this be a good idea?

rdunklau commented 9 years ago

This would be a good idea, which was already considered here https://github.com/Kozea/Multicorn/issues/76

As (briefly) mentioned in this issue, there are two problems with this approach, aside from the fact that it will probably need considerable work:

So, I am all for it, and will be very glad if someone helps with the design / implementation, but this is something rather complex that I deliberately put on the backburner...

oscardssmith commented 9 years ago

Since it doesn't seem worth breaking backward compatibility for, I think the best approach would probably to create a type called fullquals which will either be just the text sql passes down for parsing by the fdw or (probably better but more work) a tree that represents a quall by a tree operation arg1, arg2... where the first arg is a field (if the op has a field) and each arg can be another qual. so " Filter: (((pcc + 4) > 10) OR (first ~~ 'robert'::text))" would become ["OR" , [">" , ["+", "pcc", 4], 10], ["~~", "first", "robert"]]

rdunklau commented 9 years ago

-1 for passing the raw SQL String: it will be inexploitable.

I agree it should be a tree, perhaps even based on SQLAlchemy constructs: since they already have python classes for representing anything in a SQL statement, it might be a good start. The downside being that it will bring a huge dependency to Multicorn.

rdunklau commented 9 years ago

Another solution that I (briefly) considered, was to automatically wraps every C-Struct as a python object. I don't really know how that can be done, but if it could save us the overhead of building a full python object from every node in the planner that would be great. Plus, we could probably save a lot of code overhead for converting from / to both worlds.

Do you have any experience wrapping C-structs as CPython objects ?

oscardssmith commented 9 years ago

no, but it seems like this should do it http://grokbase.com/t/python/python-list/034aebzfcp/pyrex-wrapping-c-struct-as-python-class

oscardssmith commented 9 years ago

rough pseudo-code (with results in between) "(((partner_patients.pcc > 100) AND (partner_patients.pcc < 1000)) OR ((partner_patients.acc + 200) > 3000))" remove from (+1 to . remove first, last char "((pcc > 100) AND (pcc < 1000)) OR ((acc + 200) > 3000)" if there are chars outside parens

rdunklau commented 9 years ago

I'm not sure I follow you. In Multicorn, we are working directly with the plan nodes, not the raw query.

Specifically, we are looking at the simple, implicitly ANDed list of predicate filtering the relation.

It would be nice to also allow the python foreign data wrapper to add or remove predicates that will be evaluated on the foreign side, so as not to evaluate them again locally by Postgres.

massimiliano-della-rovere commented 9 years ago

I'd like to help with the python part, buy I have no knowledge of how the WHERE expressions are transformed in a C tree of struct and pointers. Can anybody tell me which source file of postgres implements this or provide any example?

Also I'd add that in the previous discussions, unary operators (NOT, EXISTS...) are not considered.

massimiliano-della-rovere commented 9 years ago

With "C tree of struct and pointers" I mean plan nodes.

rdunklau commented 9 years ago

@massimiliano-della-rovere : that would be great !

In multicorn, the transformation of predicates (named "quals" in the PostgreSQL source code), is done at planning time. An intermediate representation is stored in the MulticornPlanState structure. This is done by the extractRestrictions function in src/query.c

Then, at execution time, this intermediate representation is converted to a python object: see qualdefToPython in python.c.

As for the nodes representing a query, most of them are defined in PostgreSQL in src/include/nodes/primnodes.h. Plan and Executor nodes are defined in plannodes.h and execnodes.h respectively

If you need more info about that, feel free to ask, either here or through whatever medium you prefer :)

massimiliano-della-rovere commented 9 years ago

I'm still figuring out the various parts and searching for the points where a liimitation in added:

So basically we would need a specific extractClause... for each one of the following list, am I right? / * TAGS FOR PLAN NODES (plannodes.h) / T_Plan = 100, T_Result, T_ModifyTable, T_Append, T_MergeAppend, T_RecursiveUnion, T_BitmapAnd, T_BitmapOr, T_Scan, T_SeqScan, T_IndexScan, T_IndexOnlyScan, T_BitmapIndexScan, T_BitmapHeapScan, T_TidScan, T_SubqueryScan, T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan, T_ForeignScan, T_Join, T_NestLoop, T_MergeJoin, T_HashJoin, T_Material, T_Sort, T_Group, T_Agg, T_WindowAgg, T_Unique, T_Hash, T_SetOp, T_LockRows, T_Limit,

rdunklau commented 9 years ago

The plan nodes themselves won't be pushable with the current PostgreSQL API: there is no way to delegate an aggregate (count(*) .. GROUP BY) to the remote side.

The nodes you are interested in are primarily in primnodes.h. This is primarily all nodes in the 300 range, in the NodeTag enum you mentioned earlier (OpExpr, BoolExpr etc...).

What we need is the most straightforward way to expose a whole Expr tree to Python, ideally without too much hassle to maintain it between PostgreSQL releases.

massimiliano-della-rovere commented 9 years ago

Sorry for the long delay, but free time is a chimera :)

So the primnodes are:

        /*
         * TAGS FOR PRIMITIVE NODES (primnodes.h)
         */
        T_Alias = 300,
        T_RangeVar,
        T_Expr,
        T_Var,
        T_Const,
        T_Param,
        T_Aggref,
        T_WindowFunc,
        T_ArrayRef,
        T_FuncExpr,
        T_NamedArgExpr,
        T_OpExpr,     <----- Implemented
        T_DistinctExpr,
        T_NullIfExpr,
        T_ScalarArrayOpExpr,     <----- Implemented
        T_BoolExpr,
        T_SubLink,
        T_SubPlan,
        T_AlternativeSubPlan,
        T_FieldSelect,
        T_FieldStore,
        T_RelabelType,
        T_CoerceViaIO,
        T_ArrayCoerceExpr,
        T_ConvertRowtypeExpr,
        T_CollateExpr,
        T_CaseExpr,
        T_CaseWhen,
        T_CaseTestExpr,
        T_ArrayExpr,
        T_RowExpr,
        T_RowCompareExpr,
        T_CoalesceExpr,
        T_MinMaxExpr,
        T_XmlExpr,
        T_NullTest,     <----- Implemented
        T_BooleanTest,
        T_CoerceToDomain,
        T_CoerceToDomainValue,
        T_SetToDefault,
        T_CurrentOfExpr,

Each primnode has a very different structure, so for each (XYZ in the list below) of them we would need in query.c:

If the intention is not writing those function, an attempt could be made of automatically generate them. A parser could read the C code of both nodes.h and primnodes.h and then generate the C code for the canonical* and extractClauseFrom* functions.

Is this path fitting your plan for extending Multicorn?

rdunklau commented 9 years ago

Sorry for the long delay too :(

Yes, that would be a good idea. Someone suggested to me that we could use cffi for that, instead of writing a parser ourselves, although I don't know anything about it.

jmealo commented 8 years ago

@rdunklau The cffi sounds like a solid way to go. Any updates on this?

rdunklau commented 8 years ago

Hello.

Actually, yes ! I've started working on this but using SWIG instead to autogenerate the bindings. It will take quite a long time, but will allow to call any postgres C function from Multicorn. I'll push something on a new branch as soon as it seems usable.

jmealo commented 8 years ago

@rdunklau: Thank you for the quick reply. That's awesome and I look forward to it. Feel free to ping me when that's done if you think of it :-)

oscardssmith commented 4 years ago

has there been any status on this? I've long since moved on from the project, but am still mildly interested.