cfpb / grasshopper

CFPB's streaming batch geocoder
Creative Commons Zero v1.0 Universal
37 stars 13 forks source link

Upgrade to Elasticsearch 2.2.0 and support Shield #206

Closed chosak closed 8 years ago

chosak commented 8 years ago

This PR is a first cut at updating the version of ES used by the geocoder from 1.7.1 to 2.2.0. It also (optionally) supports use of the Shield plugin for working with secured clusters.

These changes seem to work to connect to a 2.2 cluster running Shield, however I have not verified that all integration tests work properly. Please see some additional comments below @hkeeler @jmarin

hkeeler commented 8 years ago

Wow! You've done a lot of work here.

Overall, looks good. We'll need to some local testing on this change, obviously. The only thing I see that we'll probably want to change (not necessarily in this PR) is to move the clientBuilder boilerplate into its own reusable function.

The custom Joda Time class is annoying. Hope they have a good plan to keep it in-sync with the real Joda Time over time. The last comment on that post does state:

Joda date/time parsing is only relevant to field mapping processing at server side. Clients do not use Joda at all (unless you use custom Joda-dependent code for the client, of course).

...makes it all the more annoying.

Thanks @chosak!

chosak commented 8 years ago

+1 on pulling out the client builder boilerplate.

hkeeler commented 8 years ago

Looks like they do provide the alternative of shading the whole elasticsearch.jar

We could do that with sbt-assembly's Shading. It would probably be the "safer" option, but would add more ugly to our sbt setup.

Also, here's the problem child BaseDateTime.java. At least it's an abstract class, and probably pretty stable... Still don't like it.

hkeeler commented 8 years ago

Actually, the shading doesn't seem to clutter up sbt much. I guess the real issue there is you end up with a non-standard package name in all your code.

hkeeler commented 8 years ago

I just merged this onto another branch I'm working on, and I'm get the following sbt error multiple times whenever I do sbt assembly

[error] Unable to find credentials for [Artifactory Realm @ maven.elasticsearch.org]

It does get the necessary ES jars, and the build does succeed, but I'd rather not have error messages in the build.

@chosak, did you get this error when building?

chosak commented 8 years ago

@hkeeler yes, I did see this error.

One thought is that the way that I added this to the Dependencies.scala file doesn't quite line up with the (Java) documentation provided here. What is the sbt-equivalent way of doing this?

<snapshots>
    <enabled>false</enabled>
</snapshots>
hkeeler commented 8 years ago

@chosak, I don't think that's it. We're not using any snapshot builds, as far as I can tell. This seems related to needing to add sbt Credentials. This doesn't make a whole lot of sense, though. The resource is clearly available at http://maven.elasticsearch.org/releases/org/elasticsearch/plugin/shield/2.2.0/ for download without credentials, and we're not trying to push anything up to that repo, so...???

hkeeler commented 8 years ago

I'm going to close this one out. I've merged this branch onto #211, and added some additional fixes and stopgaps to get this working in our Docker Compose setup with ES 2.2.

@chosak, if you could take a peek at #211, it'd be much appreciated.