apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.41k stars 1.27k forks source link

Return data types along with column names in Pinot JSON Response #2302

Open kishoreg opened 6 years ago

kishoreg commented 6 years ago

today we return the column names but not the data types. Pinot knows about the data type either from the segment metadata or from the schema. Knowing the data types will make it easier to write connectors to Pinot.

{ "selectionResults":{ "columns":[ "Cancelled", "Carrier", "DaysSinceEpoch", "Delayed", "Dest", "DivAirports", "Diverted", "Month", "Origin", "Year" ], "results":[ [ "0", "AA", "16130", "0", "SFO", [], "0", "3", "LAX", "2014" ], [ "0", "AA", "16130", "0", "LAX", [], "0", "3", "SFO", "2014" ], [ "0", "AA", "16130", "0", "SFO", [], "0", "3", "LAX", "2014" ] ] }, "traceInfo":{}, "numDocsScanned":3, "aggregationResults":[], "timeUsedMs":10, "segmentStatistics":[], "exceptions":[], "totalDocs":102 }

agrawaldevesh commented 5 years ago

Is there any update on this ? Seems easy enough to do by exposing the DataSchema(cachedDataScheme) from com.linkedin.pinot.core.query.reduce.BrokerReduceService#reduceOnDataTable into BrokerResponseNative. It seems straightforward and backward compatible change that I would be willing to work on it.

It would save me writing some janky client side hacks if this existed.

kishoreg commented 5 years ago

Yes. It’s definitely easy to add. You got the logic right. Feel free to submit a PR. It will be great to enable this via a config flag for now.

snleee commented 5 years ago

@agrawaldevesh We do have some support for preserving types. https://github.com/apache/incubator-pinot/pull/2830

If you add OPTION(preserveType='true'), Pinot will preserve the type for the result. One limitation is that this won't preserve types for grouping keys (Keeping types for grouping key will require considerable refactoring).

Can you play with preserveType option to see if it makes your task easier? However, it would be ideal if we can add the types along with the column names.