MeltwaterArchive / dropwizard-extra

A set of miscellaneous and common Dropwizard utilities
109 stars 45 forks source link

KafkaProducerFactory breaks due to InetSocketAddress deserialization bug #27

Closed mjwillson closed 10 years ago

mjwillson commented 10 years ago

So this is due to an off-by-one error in the relevant deserializer code in jackson-databind, which I commented on here:

https://github.com/FasterXML/jackson-databind/commit/a3faa88c1008a76a4045577116d2c14a758ab078#diff-ddc036c16ed606b8daccec61105a7e23R316

It means that it can't parse an IPv4 address or hostname with a port number. This case isn't tested in jackson-databind. Reporting this to the jackson project too, but it happens to be your pull request there as well @nicktelford :)

What this means for KafkaProducerFactory is that, if you don't specify a port number on the broker it configures the thing with a port number of 0, and then fails to connect on port 0. If you do specify the port number then you get a rather opaque failure with an error like so:

* Incorrect type of value at: brokers; is of type: String, expected: InetSocketAddress

Presumably because Jackson is catching the NumberFormatException thrown inside the deserialization code for InetSocketAddress.

mjwillson commented 10 years ago

Reported upstream here https://github.com/FasterXML/jackson-databind/issues/531

mjwillson commented 10 years ago

I tried to work around this using the IPv6 syntax, but this doesn't work either, since the factory isn't writing out IPv6 addresses into a format which kafka can read from its config property:

Property metadata.broker.list is overridden to /0:0:0:0:0:0:0:1:9092

which Kafka parses as a hostname of "/0" and a port of 0.

mjwillson commented 10 years ago

Perhaps unless a fix for this in jackson is imminent, we could just use Strings for these brokers, like the ZookeeperFactory does (protected String[] hosts = new String[]{ "localhost" };) ?

Especially given it's just going to get written out again as a string afterwards for Kafka...

mjwillson commented 10 years ago

It turns out this is fixed in Jackson 2.3.4 or on their 2.4 branch. But dropwizard-jackson depends on 2.3.3.

mjwillson commented 10 years ago

And to add another episode to this sad tale:

Even after working around this by forcing a patch-release bump of the transitive jackson-databind dependency...

It still doesn't work, because InetSocketAddress#toString just doesn't do the right thing here when you write out the addresses into kafka's config property.

user=> (.toString (java.net.InetSocketAddress. "localhost" 1234))
"localhost/127.0.0.1:1234"
user=> (.toString (java.net.InetSocketAddress. "127.0.0.1" 1234))
"/127.0.0.1:1234"

Strings seem the path of least resistance here!

nicktelford commented 10 years ago

Thanks for digging in to this.

I've opted to keep using InetSocketAddress, but manually encode them for the Kafka config instead of relying on InetSocketAddress#toString(). The reason for this is that I like having Jackson automatically validate the hosts using the InetSocketAddress constructor, which should yield better results whenever an invalid broker host is provided.

I've also added a dependencyManagement clause to require the use of jackson-databind 2.3.4 to address the original issue.

I've added some basic tests to demonstrate that both of these issues has been fixed. Please let me know if you have any other issues. Sorry about the delay on this one, I've been rather busy!

mjwillson commented 10 years ago

No prob, cheers for fixing this!

I guess with the Jackson config stuff, the strongly-typed config classes and validations seem very nice but on some level it feels like it's replicating the config parsing and validation logic that's already baked into Kafka. I guess it makes the config handling more uniform, but it is one more thing to maintain (and where things can go wrong) vs just passing through a map of string config properties and letting Kafka deal with them.

Provided it works anyway I'm not too fussed :) and I'm still getting a feel for the design conventions around dropwizard stuff.

Cheers -Matt

On Wed, Sep 24, 2014 at 3:28 PM, Nick Telford notifications@github.com wrote:

Thanks for digging in to this.

I've opted to keep using InetSocketAddress, but manually encode them for the Kafka config instead of relying on InetSocketAddress#toString(). The reason for this is that I like having Jackson automatically validate the hosts using the InetSocketAddress constructor, which should yield better results whenever an invalid broker host is provided.

I've also added a dependencyManagement clause to require the use of jackson-databind 2.3.4 to address the original issue.

I've added some basic tests to demonstrate that both of these issues has been fixed. Please let me know if you have any other issues. Sorry about the delay on this one, I've been rather busy!

— Reply to this email directly or view it on GitHub https://github.com/datasift/dropwizard-extra/issues/27#issuecomment-56677832 .

nicktelford commented 10 years ago

The advantage of having Dropwizard parse and validate the configuration is that it won't allow you to run your application with an invalid config, which is a massive time-saver vs. trying to figure out why your consumer keeps failing only to find that you've typo'd your "fetch.size" parameter in the config file.