aws / aws-sdk-java

The official AWS SDK for Java 1.x (In Maintenance Mode, End-of-Life on 12/31/2025). The AWS SDK for Java 2.x is available here: https://github.com/aws/aws-sdk-java-v2/
https://aws.amazon.com/sdkforjava
Apache License 2.0
4.13k stars 2.83k forks source link

Code samples for aws-java-sdk-elasticsearch #861

Closed amihaiemil closed 8 years ago

amihaiemil commented 8 years ago

I am trying to use the aws sdk to communicate (index, search etc) with the AWS ElasticSearch service. Where can I find some propper code samples/tutorials?

I could only find this blog post and it's a bit confusing. I don't understand how those response/error handlers work and how exactly are they related with the final Response returned by AmazonHttpClient.execute(...)

Any help would be much appreciated.

kiiadi commented 8 years ago

What are you trying to do with elastic search? The Java SDK allows you to communicate with AWS services by making simple calls through a Java API. Whilst we don't have all the use-cases for all services documented (yet - it's something we're working on actively!) we do have some examples for how to use the SDK with other services see here and the API documentation for Elastic Search is here

Typically customers shouldn't need to worry about the internals of the http-layer (ie AmazonHttpClient etc) - our goal is to abstract that so that you can concentrate on your business logic.

Apologies for the fairly general answer - I can provide more specific help if you let me know what you're trying to do.

amihaiemil commented 8 years ago

@kiiadi I am trying to index and search documents in the ElasticSearch service - i looked around more after i posted this and realised the elasticsearch sdk is more for admin functionalities, like create/update domain, and not for the actual interaction with the elasticsearch instance. At least that's what I think.

I do appreciate your efforts on the these libraries, but to be honest it all seems so complicated! Builders everywhere, public classes that warn in the javadoc that they should not be used by the clients (then why not make it package protected?) etc.

I wanted to use the SDK solely for the request signing process but I think it would be quicker to try and sign the requests myself and simply use apache http client.

To be concrete, I have an ES instance in the AWS cloud, using the Amazon ElasticSearch Service (so it's not a simple ES on an EC2 instance). Say the endpoint is http://myaws.elastic.com . How can I index the following json document ?

{
    "id":"some_id_here",
    "content":"some content right here that I want to make searchable"
}

using the java sdk? Also how can I perform a search using the sdk?

I know how to work with elasticsearch and what HTTP calls I need to make, to what endpoints - I just don't know how the aws sdk wraps all that.

kiiadi commented 8 years ago

@amihaiemil you're right in saying the SDK for Elastic Search is mainly for administrative functions. Unfortunately AWS ElasticSearch does not currently support the Elastic Search native transport protocol so you won't be able to use many of the Elastic clients that are out there. If you take a look at the limits documented here you can see that only port 80 is open for use via HTTP.

I will raise this as a feature request to the Elastic Search service team.

If you're using open authentication you should just be able to post to the endpoint as in the CURL examples here.

If you need IAM signing you can roll-your own relatively easily as you alluded to in your original question using some of the internals of the SDK. Take a look at the following snippet:

private static final String REGION = // region that your ES cluster is deployed to eg "us-west-2";
private static final String ES_ENDPOINT = // endpoint of your ES cluster eg "http://blah-asdflasjfd.us-west-2.es.amazon.com"
private static final AWSCredentials CREDENTIALS = new DefaultAWSCredentialsProviderChain().getCredentials();

public static void putDocument(String index, String id, String jsonPayload) {
    Request<?> request = new DefaultRequest<Void>("es");
    request.setContent(new ByteArrayInputStream(jsonPayload.getBytes()));
    request.setEndpoint(URI.create(ES_ENDPOINT + "/" + index + "/" + id));
    request.setHttpMethod(HttpMethodName.POST);

    AWS4Signer signer = new AWS4Signer();
    signer.setRegionName("us-west-2");
    signer.setServiceName("es");
    signer.sign(request, CREDENTIALS);

    AmazonHttpClient client = new AmazonHttpClient(new ClientConfiguration());

    client.execute(request, new DummyHandler<>(new AmazonWebServiceResponse<Void>()), new DummyHandler<>(new AmazonServiceException("oops")), new ExecutionContext(true));
}

public static class DummyHandler<T> implements HttpResponseHandler<T> {
    private final T preCannedResponse;
    public DummyHandler(T preCannedResponse) { this.preCannedResponse = preCannedResponse; }

    @Override
    public T handle(HttpResponse response) throws Exception {
        System.out.println(IOUtils.toString(response.getContent()));
        return preCannedResponse;
    }

    @Override
    public boolean needsConnectionLeftOpen() { return false; }
}

In terms of the HttpResponseHandler, it is responsible for taking an HTTP response payload and unmarshalling into the given object type <T> - in the case of the above snippet we're just spitting out the content to console and returning a dummy value.

Regarding your comment on the complexity of the Java SDK - we're always looking for feedback so any suggestions you could give us will help improve in future iterations. WRT to builders specifically - this is a strategy that we've employed to ensure that we can continue to evolve our API (ie: add properties to objects etc) whilst remaining backwards compatible.

Thank you for the feedback regarding the visibility of some of the inner workings of the SDK. We're looking at ways we can make it more obvious to consumers what's "extendable" and what's internal.

Hope that helps.

kiiadi commented 8 years ago

You might also want to checkout Jest which is a client for ElasticSearch that works over the HTTP REST API.

amihaiemil commented 8 years ago

@kiiadi Sure, thank you very much for the support :)

amihaiemil commented 8 years ago

@kiiadi one more question (this is what is in fact confusing for me) What happens to the object returned by the response handler?

How can I access it from outside that class? AmazonHttpClient.execute(...) returns a Response, and I can access an HttpResponse from that, by calling getHttpResponse(), but the content InputStream of that is closed and I cannot read it!

And let's say that's a bug, that IS is not supposed to be closed, but still, what happens to what the response handler returns? Right now it seems to me that it's lost? - surely cannot be, I'm missing something

kiiadi commented 8 years ago

The handler is responsible for "unmarshalling" the wire-response into some object of type T the call to AmazonHttpClient.execute(...) returns a Response<T> - you should be able to get at T via response.getAwsResponse()

amihaiemil commented 8 years ago

@kiiadi I see now :) thank you again. I close this ticket then.

ABastionOfSanity commented 7 years ago

I have been following this route for a bit and wanted to add a note for clarity, as I just got tripped up here. When attempting the put, if the URI passed to setEndpoint contains query parameters(i.e. ?version=N), the request fails, as a single trailing / will be appended by SDKHttpUtils.appendUri.
I'm not sure if this is a bug, but the workaround is to decompose URI into proper endpoint, resource path, and Map<String, List> to set on the DefaultRequest with setEndpoint, setResoucePath, setParameters.

jamesxabregas commented 7 years ago

@ABastionOfSanity Thank you for sharing that additional info. I was having the same problem where appending a query string caused signature verification to fail. Your solution solved my issue.

I can't understand why AWS has not provided a simple method for signing ElasticSearch requests. It seems crazy that every client out there has to roll their own.

amihaiemil commented 7 years ago

@jamesxabregas @ABastionOfSanity you can sign http requests to aws elastic search service using their core java sdk.. but it is quite a mess. Here's how I did it (I wrapped their api inside some elegant decorators):

http://www.amihaiemil.com/2017/02/18/decorators-with-tunnels.html

amihaiemil commented 7 years ago

@jamesxabregas @ABastionOfSanity In fact, with the core sdk you can make signed http requests to any of their service, not just ES...

nlperez11 commented 7 years ago

@kiiadi thank you man, this method works perfectly.

fjanuszewski commented 7 years ago

Hello, when I enter the following URL my console returns the following error.

URL: https:/endpointexample.us-east-1.es.amazonaws.com/_search?source={"query":{"term":{"InitiationTimestamp":"2017-11-03T18:38"}}}

Console: java.net.URISyntaxException: Illegal character in query at index 100: https:/endpointexample.us-east-1.es.amazonaws.com/_search?source={"query":{"term":{"InitiationTimestamp":"2017-11-03T18:38"}}}

I understand that there is a Syntax problem in my query. How can I add the query to my URL? Thanks!

IanLKaplan commented 6 years ago

The AWS Elasticsearch documentation is somewhere between horrible and non-existent. I'm trying to figure out how to talk to the AWS Elasticsearch Service with Java. I've poured through the JavaDoc documentation. But I have not been able to figure out how to define the Elasticsearch schema and download documents.

@kiiadi writes:

The Java SDK allows you to communicate with AWS services by making simple calls through a Java API. Whilst we don't have all the use-cases for all services documented (yet - it's something we're working on actively!) we do have some examples for how to use the SDK with other services see here and the API documentation for Elastic Search is here

It would be great to be able to use the Java API. But there are not examples I've been able to find. While there is example code for other Amazon services like DynamoDB, I've found nothing for Elasticsearch.

millems commented 6 years ago

@IanLKaplan I have forwarded this feedback to the AWS Elasticsearch team.

amihaiemil commented 6 years ago

@IanLKaplan @millems

When this ticket was opened, the ElasticSearch SDK offered only administrative functions of the ElasticSearch instance (see our discussion above). So, ElasticSearch's API itself was not covered. As I wrote above, you can make your own HTTP calls, but you have to sign them... if you are using v1 of the SDK (the older one), here is how to do it:

https://www.amihaiemil.com/2017/02/18/decorators-with-tunnels.html

If you want to try v2 of the SDK, which looks better but seems to be in developer preview, study this Issue, it may help: https://github.com/aws/aws-sdk-java-v2/issues/339

IanLKaplan commented 6 years ago

Thanks for the replies @millems and @amihaiemil Given the (to be favorable) poor state of documentation for the AWS Elasticsearch Service, I have to wonder how serious Amazon is about supporting this service. I was going to give up on the AWS Elasticsearch Service and use the hosted Elasticsearch from Elastic Co (elastic.co). Unfortunately, the cost of this service is way outside my budget (over $300/month). So I'm back to AWS, where I'll hope for the best. Oh, wait, I guess that I should add "I, for one, welcome our new Amazon overlords"

amihaiemil commented 6 years ago

@IanLKaplan By the way, the code from the article linked in my previous comment is put in use here. There are operations for index document, bulk index (if I remember well), delete index, search and ping. So, all the basics. But again, this is v1 of the SDK, they are releaseing v2 already.

aetter commented 6 years ago

Hi @IanLKaplan, like other replies have said, the AWS Java SDK only covers configuration operations like creating and updating domains. For interacting with Elasticsearch itself, yep, you'll need to sign and send your own HTTP requests to the service endpoint. I added Java, Python, Ruby, and Node code samples for signed HTTP requests to the Amazon Elasticsearch Service guide early this year.

If you're using Java, the preferred method is to use one of the Elasticsearch Java REST clients (I believe high and low level should both work) and the AWS Request Signing Interceptor. You can find two Java code samples, signed and unsigned, here. You might also find the code sample in the examples directory helpful.

If you want to ingest from other AWS services, the Integration page offers a number of Lambda functions that you might find useful.

Definitely let me know if you have any other feedback on the documentation. Thanks a bunch.

IanLKaplan commented 6 years ago

Thank you for the informative post @aetter I really appreciate it. I will take a look at the resources you linked to. I may have seen some of them, which I used as a reference for writing signed HTTP operations.

The AWS Elasticsearch Service is an important AWS resource, so I hope Amazon will continue to add developer resources.

Srikanth1589 commented 6 years ago

@aetter great post man... kudos!!

woolfel commented 5 years ago

I'm reading this old ticket and amazon still doesn't have a basic java client for Elastic Search. I'm getting the impression AWS doesn't really care about proper documentation. The example in this thread is quite different than what's on the official docs page https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-request-signing.html

Even then, the examples above uses deprecated API and is no longer valid. I'm starting to wonder if using Elastic Search is really worth the headache if AWS isn't going to support it properly with a java client and good documentation. I've used lucene/Solr quite a bit in the past and it' much easier and less painful than trying to figure out what AWS expects.

amihaiemil commented 5 years ago

@woolfel See my comments above, about how you can make your own HTTP requests to ElasticSearch (it's easy, the main problem is just signing them for AWS). I don't think they built an actual ElasticSearch client yet.

As discussed in the first comments, what the AWS sdk offers now, related to ElasticSearch, are just some admin functionalities (actually related to the ES cluster on AWS), not the API of ES itself.

woolfel commented 5 years ago

@amihaiemil yup, I already read through the latest Elastic Search API on github before I found this ticket. Using Solr is even easier than navigating the crappy documentation from a company that makes billions each quarter selling cloud services. I've been contributing to open source since 2002 and even by open source standards, their docs are crappy. I had to write my own sigv4 signer so I can make API gateway calls from iOS with swift. The whole sigv4 signing documentation is in sorry state.

amihaiemil commented 5 years ago

@woolfel I also worked a little with SOLR but found it a bit more complex than ES. And working with Lucene is like building your own car (ES is the car, Lucene is the engine).

Your problem is, in fact, AWS' infrastrucutre which is in front of ES. Maybe try a different ES provider? Or deploy your own version on your own servers? (it's quite easy)

woolfel commented 5 years ago

@amihaiemil - it usually is AWS infrastructure making things painful. I get they want things to be secure, but that's why documentation is critical. Fact their docs are often 2 years out-of-date and screen captures of console are wrong is just sad. I agree in theory ES makes it easier to scale than building your own with Solr/Lucene. I have old projects on AWS EC2 that do exactly that. I'm just venting my frustration because there should be an easy client that handles the request signatures for ES. Sometimes I feel AWS purpose lets the docs get stale so people pay for premium service.

IanLKaplan commented 5 years ago

I have an Elasticsearch demo application up on GitHub. This application includes Java code that supports signed HTTP. It also has an extended GET HTTP class that allows a GET operation to have an argument. This code is published under the Apache license, so you are free to use it. See https://github.com/IanLKaplan/BookSearchES/

The demo application is written as a set of services. I tried to design the code so that it can be easily reused. The HTTP code is in a separate service, as is the Elasticsearch code.

woolfel commented 5 years ago

@IanLKaplan thanks, I'll take a look

aetter commented 5 years ago

Hi @woolfel, thanks for replying to this thread. I maintain the Amazon Elasticsearch Service guide, so I'm always looking to improve it. I just updated both Java code samples to use the updated versions of the performRequest and index methods. It looks like they changed between 6.3 and 6.4. My tests indicate that both samples still work with the 6.4.3 JAR files, but let me know here if you spot any other issues.

I totally understand the frustration around the lack of an AWS SDK-like client that handles request signing to the service and lets you index documents, run searches, etc. I asked for the same thing when I first joined the team. Over time (and possibly because no such client was available), I came to prefer the flexibility of the high-level Java REST client, though; if I want to test my code on a local Elasticsearch install or a self-managed cluster, I can use the same client, sans the request interceptor.

Emphasizing already-popular clients feels like a better way to be a partner in the Elasticsearch ecosystem and help customers move workloads to and from the service, but I understand that a lot of people want to go all-in on AWS, and the lack of a dead simple, drop-in client can be a hurdle to overcome. I'll bring the issue up with the team again, but I hope this post helps explain where we're coming from. Thanks again!

Andrew

woolfel commented 5 years ago

@aetter thanks for taking time to respond. I will spend some time to organize my thoughts and post the suggestions to the link you provided. I was able to get my Lambda function posting bulk data to my ES domain. I used my Cloud9 environment to debug the issue and track down exactly what was going on, then I changed my java code to do the right thing.

woolfel commented 5 years ago

@aetter I submitted a ticket with a detail list of suggestions for you. sorry if I was too agro and pissed at the docs

bulingfeng commented 3 years ago

@amihaiemil you're right in saying the SDK for Elastic Search is mainly for administrative functions. Unfortunately AWS ElasticSearch does not currently support the Elastic Search native transport protocol so you won't be able to use many of the Elastic clients that are out there. If you take a look at the limits documented here you can see that only port 80 is open for use via HTTP.

I will raise this as a feature request to the Elastic Search service team.

If you're using open authentication you should just be able to post to the endpoint as in the CURL examples here.

If you need IAM signing you can roll-your own relatively easily as you alluded to in your original question using some of the internals of the SDK. Take a look at the following snippet:

private static final String REGION = // region that your ES cluster is deployed to eg "us-west-2";
private static final String ES_ENDPOINT = // endpoint of your ES cluster eg "http://blah-asdflasjfd.us-west-2.es.amazon.com"
private static final AWSCredentials CREDENTIALS = new DefaultAWSCredentialsProviderChain().getCredentials();

public static void putDocument(String index, String id, String jsonPayload) {
    Request<?> request = new DefaultRequest<Void>("es");
    request.setContent(new ByteArrayInputStream(jsonPayload.getBytes()));
    request.setEndpoint(URI.create(ES_ENDPOINT + "/" + index + "/" + id));
    request.setHttpMethod(HttpMethodName.POST);

    AWS4Signer signer = new AWS4Signer();
    signer.setRegionName("us-west-2");
    signer.setServiceName("es");
    signer.sign(request, CREDENTIALS);

    AmazonHttpClient client = new AmazonHttpClient(new ClientConfiguration());

    client.execute(request, new DummyHandler<>(new AmazonWebServiceResponse<Void>()), new DummyHandler<>(new AmazonServiceException("oops")), new ExecutionContext(true));
}

public static class DummyHandler<T> implements HttpResponseHandler<T> {
    private final T preCannedResponse;
    public DummyHandler(T preCannedResponse) { this.preCannedResponse = preCannedResponse; }

    @Override
    public T handle(HttpResponse response) throws Exception {
        System.out.println(IOUtils.toString(response.getContent()));
        return preCannedResponse;
    }

    @Override
    public boolean needsConnectionLeftOpen() { return false; }
}

In terms of the HttpResponseHandler, it is responsible for taking an HTTP response payload and unmarshalling into the given object type <T> - in the case of the above snippet we're just spitting out the content to console and returning a dummy value.

Regarding your comment on the complexity of the Java SDK - we're always looking for feedback so any suggestions you could give us will help improve in future iterations. WRT to builders specifically - this is a strategy that we've employed to ensure that we can continue to evolve our API (ie: add properties to objects etc) whilst remaining backwards compatible.

Thank you for the feedback regarding the visibility of some of the inner workings of the SDK. We're looking at ways we can make it more obvious to consumers what's "extendable" and what's internal.

Hope that helps.

how to set username and password. for security。