cfpb / grasshopper

CFPB's streaming batch geocoder
Creative Commons Zero v1.0 Universal
37 stars 13 forks source link

Use local ES for test-harness & upgrade to ES 2.2 w/ Shield #211

Closed hkeeler closed 8 years ago

hkeeler commented 8 years ago

Closes #209

This PR is primarily intended to refactor the Docker Compose configuration for the test-harness, using a locally running Elasticsearch instance, rather than one in Docker due to VirtualBox's slow I/O. However, I did not want to come back to this as soon as we upgraded to ES 2.2, so I've merged in the work @chosak did on #206, and then added a few additional fixups to get it all running.

A few of these "fixes" need further research, like:

I also discovered bugs in the Census geocoder unrelated to the ES upgrade. There will be follow-up issues for both grasshopper and grasshopper-loader coming soon with more details on these fixes. In the meantime, I have commented out the houseQuery portion of the filter. This means the Census geocoder now returns results, but they are pretty inaccurate.

I have not changed the standard Docker Compose setup, keeping ES as a Docker container. I don't think we'll be using a full dataset when running this setup, so it seemed better to keep it simple. If we decide we need this external ES setup across-the-board, it'll be a pretty simple fix.

I think this PR now replaces #206 as it has all of its changes, plus more. However, there's a lot of good conversation over on that issue, so please check it out as well while reviewing this one.

chosak commented 8 years ago

The changes pulled in from #206 look good to me. As you mentioned over there it'd be nice to pull out the client builder stuff into a utility function, as that code's now repeated in several places.

Regarding fully-qualified field names, see this comment in the 2.0 breaking changes notes.

I dug a bit into matchPhraseQuery but couldn't find any obvious 2.0+ change that would have modified this behavior.

hkeeler commented 8 years ago

@chosak, thanks for checking this out, and for the link about how references to nested fields changed.

hkeeler commented 8 years ago

I've created issue #212 and #213 for tracking outstanding issues from this PR.

hkeeler commented 8 years ago

I think the test_harness docs are in pretty good shape. I'm taking down the "do not merge" sign. The PR is now ready for full review.

If you'd like to check out the docs rendered form, head over to: