apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.5k stars 3.7k forks source link

Make the RabbitMQ play nice with the sampler (allowing it to be part of the data loader flow) #16497

Open vogievetsky opened 5 months ago

vogievetsky commented 5 months ago

Following the work in https://github.com/apache/druid/pull/14137 it would be great if the rabbitmq extension worked with the sampler. This would allow it to be used in the data loader flow.

Right now doing a POST to

/druid/indexer/v1/sampler

payload

{
   "type":"rabbit",
   "spec":{
      "type":"rabbit",
      "ioConfig":{
         "type":"rabbit",
         "uri":"localhost:5672",
         "stream":"my_queue",
         "inputFormat":{
            "type":"regex",
            "pattern":"([\\s\\S]*)",
            "listDelimiter":"56616469-6de2-9da4-efb8-8f416e6e6965",
            "columns":[
               "raw"
            ]
         }
      },
      "dataSchema":{
         "dataSource":"sample",
         "timestampSpec":{
            "column":"!!!_no_such_column_!!!",
            "missingValue":"1970-01-01T00:00:00Z"
         },
         "dimensionsSpec":{

         },
         "granularitySpec":{
            "rollup":false
         }
      },
      "tuningConfig":{
         "type":"rabbit"
      }
   },
   "samplerConfig":{
      "numRows":500,
      "timeoutMs":15000
   }
}

results in

{
    "error": "Failed on call to `getDeclaredMethods()` on class `org.apache.druid.indexing.rabbitstream.RabbitStreamSamplerSpec`, problem: (java.lang.NoClassDefFoundError) com/rabbitmq/stream/MessageHandler\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 9]"
}

probably because RabbitStreamSamplerSpec is missing.

I started a base bones implementation in the console here https://github.com/vogievetsky/druid/tree/console_rabbitmq but I need the sampler to work to move forwards.

@jamiechapmanbrn would you be interested in filling this gap?

jamiechapmanbrn commented 5 months ago

I'm interested, but I can't promise any kind of timeline at the moment. I can definitely help review a pr in the area though.