Open izveigor opened 1 year ago
I agree that the signature system needs some work and I guess that you came up w/ a quite solid approach. However I am having some trouble following it :sweat_smile: . Some questions / comments:
*
functionHello, @crepererum! I'll try to explain as best I can.
There is a finite set of data types (which we denote by DataType)
Any of the possible types (options) of Signature
can be described into 4 categories:
definite
(def) and undefined
(undef);
Definition: a definite data type means that only a specific data type from the DataType
set is suitable.
(Example Exact(
DataType::Int8)
accepts (
DataType::Int8)
, but not (DataType::UInt8
)).
Definition: An undefined data type means that any data type from the set of DataTypes is suitable.
(Example: Any()
accepts (DataType::Int8
), and also (DataType::UInt8
)).
equal
(eq) and unequal
(uneq);
Definition of Equality: the category of equality means that all elements are equal to each other.
(Example: VariadicEqual()
can accept (DataType::Int8, DataType::Int8)
or (DataType::UInt8, DataType::UInt8, DataType::UInt8)
, but not (DataType::Int8, DataType::UInt8)
).
Definition of Inequality: The category of inequality means that the elements can be arbitrary.
(Example: VariadicAny()
can accept (DataType::Int8, DataType::Int8)
, and also (DataType::Int8, DataType::UInt8)
)
Now, combine categories:
I choosed Kleene algebra (which is used for regular expressions). So if we create the algorithm for Signature
(i. e. for boolean function, which can accept either input data set or not).
For out case it is sufficient to apply only regular language (like can accept Deterministic finite automaton (DFA)).
So, each type of signature represents a seperate DFA.
As exist situations, which a signature can check not only by one DFA, but many. So, we create the same regular meta language.
For example, if we want to use two DFAs, and if one of it returns the positive answer, than input data set suits us (OneOf
case).
For full compatibility, it is worth adding two new signature types (Equal
и Concat
).
Separately, it is worth mentioning the function Concat
. Concat
can accept only input data set (without Kleene star (only Equal
, Any
, Uniform
and Exact
)).
Concat
takes a set of data, divides them according to the size of each type of signature and uses the already specific DFA.
English is not the author’s native language, so there may be some difficulties in understanding. I hope you understood me and my idea seemed reasonable to you :)
Is your feature request related to a problem or challenge?
Follow on https://github.com/apache/arrow-datafusion/issues/6559.
Argument quality:
def
) and undefined (undef
);eq
) and unequal (uneq
);Combine qualities:
eq-def
(equal definite)eq-undef
(equal undefined)uneq-def
(unequal definite)uneq-undef
(unequal undefined)Algebra
Kleene algebra {+, ·, *}
uneq-def1
+uneq-def2
+ ...)*eq-undef
)*uneq-undef
)*uneq-def1
+uneq-def2
+ ...)^nuneq-def1
·uneq-def2
· ...)uneq-undef
)^n(+)
: present in the current version;(-)
: not present in the current version;Undefined
eq-undef
uneq-undef
arg
)*VariadicEqual
(+)VariadicAny
(+)arg
)^nEqual
(-)Any
(+)Definite
eq-def
uneq-def
arg1
+arg2
+ ...)*Variadic
with single argument (+)Variadic
with multiple arguments (+)arg1
+arg2
+ ...)^nUniform
with single argument (+)Uniform
with multiple arguments (+)arg1
·arg2
· ...)Exact
with the same data type (+)Exact
with different data types (+)Meta Algebra
Kleene algebra {+, ·, *}
expr1
+expr2
+ ...TypeSignature
if it makes senseexpr1
·expr2
· ...Exact
,Uniform
,Equal
,Any
(without Kleene closure)Argument expansion
Variadic
Input:
Output:
Uniform
Input:
Output:
Exact
Input:
Output:
Proposed code for future features:
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response