datastax / pulsar-transformations

Apache License 2.0
10 stars 8 forks source link

Compute Step: add toInt, toDouble and filter(list, expression) functions #113

Closed eolivelli closed 1 year ago

eolivelli commented 1 year ago

Summary:

The main usage is:

"fields":[
     {
       "name": "value.filteredResults", 
       "expression" : "fn:filter(value.results,'fn:toDouble(record.similarity) > 0.7')"
     }
]

Please note that it is important to use "fn:toDouble" because the "query" step aways return string values and you have to convert the value to "double" in order to perform the evaluation correctly.

eolivelli commented 1 year ago

@cbornet @aymkhalil @nicoloboschi the solution here is a bit hacky because it wasn't supported to return an ARRAY type.

it works well for the schemaless json case (I added unit tests) and it works also for AVRO in case the list is a list<map<string,string>> (the results of the query step). But it won't work with ARRAys of other types

cbornet commented 1 year ago

Do we really need functions toInt and toDouble ? For the other operations we rely on type coercion. Maybe that would also work for fn:filter and make it more user friendly ?

cbornet commented 1 year ago

Do we really need functions toInt and toDouble ? For the other operations we rely on type coercion. Maybe that would also work for fn:filter and make it more user friendly ?

@eolivelli @aymkhalil

aymkhalil commented 1 year ago

Yes. Coercion should do. Could be as generic as coerceToNumber.