apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

Druid Lookups introspect keys and values endpoints do not return valid JSON #17361

Open teyeheimans opened 3 days ago

teyeheimans commented 3 days ago

Description

While analyzing the Lookup features of druid, I noticed that the keys and values endpoints for lookups do not return valid JSON.

https://druid.apache.org/docs/latest/querying/lookups#introspect-a-lookup

Example response:

"[20416, 20404, 20415, 02F440, 02F461, 20420, 02F402, 02F480, 20408, 20409, 20410, 20412, 20402, 02F421, 02F420, 20601, 02F601, 02F620, VODAFONE, CLARO]

It seems that all keys or values are just joined with , and wrapped between two square brackets.

Finally, the documentation seems incorrect on this page: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

It states:

Introspection to / returns the entire map. Introspection to /version returns the version indicator for the lookup.

However, /version does not seem to work and returns an 404.

Motivation

For as far as I know, all API endpoints return valid JSON. However, the introspect keys and values do not. This is incorrect in my opinion.

ashwintumma23 commented 1 day ago

Hi @teyeheimans, What type of lookup are you creating?

Map Lookup

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys
[1, 2, 3]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values
[One, Two, Three]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/version -- Does not return anything

  - `/version` endpoint is not implemented in `MapLookupIntrospectionHandler` ; hence, we do not see the response.

### cachedNamespace Lookup
* With the following configuration

{ "type": "cachedNamespace", "extractionNamespace": { "type": "uri", "uri": "file:/tmp/sampleCSV.csv", "namespaceParseSpec": { "format": "csv", "columns": [ "key", "value" ], "skipHeaderRows": 1 }, "pollPeriod": "PT30S" }, "firstCacheTimeout": 0 }

I see all the endpoints returning responses: 

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/
{"20":"Twenty","10":"Ten","30":"Thirty"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/keys ["20","10","30"]

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/values ["Twenty","Ten","Thirty"]

$ $ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/csvLookup/version {"version":"1729184323236"}


* One caveat to call out here is `/version` endpoint does not return the version which was set manually when lookup was being created, but the epoch time. I see version as `v1` on the Console, but `1729184323236` on the Introspect API response.
![image](https://github.com/user-attachments/assets/ade98762-f5ae-4284-9874-43438f113f4d)

Thanks! 
teyeheimans commented 1 day ago

I am using a map lookup, just like you. Your example shows the problem already:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/    
{"1":"One","2":"Two","3":"Three"}

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/keys   
[1, 2, 3]        

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values     
[One, Two, Three]

The values returned in your example is NOT valid JSON. The values are not quoted. The correct response would be:

["One", "Two", "Three"]

Also, to check if it is valid JSON you could use jq:

$ curl -X GET http://localhost:8888/druid/v1/lookups/introspect/mapLookup/values | jq '.'

This also happens when the keys are strings. So the keys and values endpoints of the introspect API's are NOT returning valid JSON.

Finally, the version endpoint does not seem to work (indeed). However, it is documented that it should be there, so the documentation seems to be incorrect. See this page at the bottom: https://druid.apache.org/docs/latest/querying/lookups-cached-global/#introspection

abhishekrb19 commented 1 day ago

@teyeheimans, that does look like a bug. This is the relevant introspection code for map lookups: https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/query/lookup/MapLookupExtractorFactory.java#L156.

I think getValues() response should just be map.values() instead of map.values().toString(), which would result in a String representation on the underlying collection. The same would apply to getKeys(). If that sounds about right, please feel free to raise a PR.

abhishekrb19 commented 1 day ago

Btw, you can directly query a map lookup in SQL: SELECT "k", "v" FROM "lookup"."mapLookup". This should return the keys and values in the correct string form. The Druid web-console uses SQL instead of API to introspect values when you open the lookup modal.

ashwintumma23 commented 12 hours ago

Hi @abhishekrb19,

For the /version endpoint:

Documentation-wise

Functionality-wise

teyeheimans commented 1 hour ago

I agree on what you describe. However, I am not familiar with the java-side of druid. We have created an PHP client for druid, see https://github.com/level23/druid-client.

Recently I have integrated support for lookup management. There I found out that the response of the keys and values endpoints do not return valid JSON (at least for the MAP lookup). If I just use the introspect endpoint, it does give me valid JSON. So this is wrong and is the reason why I started this topic.

Also, I find it strange that it is not possible to specify for all different types of lookups if the data is injective or not. Also strange is that the same injective functionality is called oneToOne in the kafka lookup.