apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.12k stars 653 forks source link

Is it possible to utilize the RegistryRequestModifier to send AWS headers to Neptune? #1993

Closed magutierrm closed 1 year ago

magutierrm commented 1 year ago

Version

4.9.0

Question

Hello, Starting from version 4.3.0, java.net.HttpClient is used as the HTTP client instead of org.apache.http.client. Previously, if you wanted to sign your requests, you could intercept them and add the signed HTTP headers using methods provided by the Apache HTTP client. However, with java.net.HttpClient, there is no straightforward way to achieve this. Is it possible to accomplish this using org.apache.jena.http.sys.RegistryRequestModifier? I am doing something like this:

        Map<String, String> signedHeaders = getAwsHeaders();
        HttpRequestModifier modifier = (params, headers) -> headers.putAll(signedHeaders);
        RegistryRequestModifier.get().addPrefix("https://my-neptune-cluster.neptune.amazonaws.com:8182", modifier);

However, when I check the logs of my requests, I don't see any headers added.

Thank you in advance.

afs commented 1 year ago

Try "add" not "addPrefix"

Your example throws "IllegalArgumentException" (because prefix does not end in "/").

magutierrm commented 1 year ago

@afs Thanks for the answer. Unfortunately, I also tried with add but it did not work. Is there any place where RDFConnectionRemoteBuilder should call RegistryRequestModifier? Maybe I am calling it in a wrong place

afs commented 1 year ago

Were you getting IllegalArgumentException? If not, then maybe the code isn't called.

You can use addPrefix("https://my-neptune-cluster.neptune.amazonaws.com:8182/") -- ends in "/" and applies to any URL at that host over https.

The registry is system-wide.

To test, use:

                HttpRequestModifier modifier = (params, headers) -> {
                    System.err.println("CALLED");
                    };
                RegistryRequestModifier.get().add(....., modifier);

and see what client-side output your get.

After that, please provide a complete, minimal example. It does not have to be "amazonaws.com".

magutierrm commented 1 year ago

I did not receive the exception and is not being called, Not sure if I am doing something wrong

@Slf4j
@Component
public class Test {

    @PostConstruct
    public void init() throws InterruptedException, IOException {

        HttpRequestModifier modifier = (params, headers) -> {
            System.err.println("CALLED");
            headers.put("x-test", "test");
        };
        RegistryRequestModifier.get().addPrefix("https://github.com/status/", modifier);

        HttpClient httpClient = HttpClient.newBuilder().build();
        HttpResponse<String> response = httpClient.send(
            HttpRequest.newBuilder()
                       .GET()
                       .uri(URI.create("https://github.com/status/"))
                       .build(),
            HttpResponse.BodyHandlers.ofString()
        );

        log.info("Request {}", response.request());
        log.info("Response {}", response);
        log.info("Headers {}", response.request().headers().map());
    }

}

Logs:

2023-08-17 12:07:52.810 - INFO 25359 --- [           main] com.xxx.xxx.neptune.config.Test      : Request https://github.com/status/ GET
2023-08-17 12:07:52.811 - INFO 25359 --- [           main] com.xxx.xxx.neptune.config.Test      : Response (GET https://github.com/status/) 200
2023-08-17 12:07:52.811 - INFO 25359 --- [           main] com.xxx.xxx.neptune.config.Test      : Headers {}
afs commented 1 year ago

HttpClient httpClient = HttpClient.newBuilder().build(); HttpResponse<String> response = httpClient.send(

There isn't any Jena code there : RegistryRequestModifier is a Jena feature.

Example:

        HttpRequestModifier modifier = (params, headers) -> {
            System.err.println("CALLED");
            //headers.put("x-test", "test");
        };

        String URL = "https://github.com/status/";
        RegistryRequestModifier.get().addPrefix(URL, modifier);

        try {
            RDFConnection conn = RDFConnectionRemote.service("https://github.com/status")
                    .queryEndpoint("query")
                    .build();
            conn.query("ASK{}").execAsk();
        } catch (QueryExceptionHTTP ex) {
            if ( 404 != ex.getStatusCode() )
                System.err.print("Expected status code 404. Got "+ex.getStatusCode());
        }

        System.out.println("DONE");
        System.exit(0);

It only seems to work for query and update, not GSP/DSP which probably is a bug.

What operations are your trying to use?

magutierrm commented 1 year ago

@afs Thanks for this code snippet. I didn't know that we can directly add our service without having to pass an HTTP Client. This works properly 😄 May I ask why you have decided to use the http.net.java.HttpClient? In my opinion, it is a very restricted client. In any case, you can close the issue. Thank you again

afs commented 1 year ago

I'm curious as to what features you think are missing. Code is never finished.

The requirement for Jena is to provide APIs and in most cases hiding the details of HTTP. Semantic Web usage of HTTP isn't the full generality of HTTP.

Jena used to use AHC (Apache HttpClient) version 4. There was some ability to optionally configure AHC for connection pools but APIs provided the functionality for SPARQL (query/update and graph store protocol) without requiring the user to understand AHC. There was some URLConnection usage as well. Support for authentication was a bit basic. No HTTP/2 support.

AHC5 is quite different at least to get the new features such as HTTP/2 support. So migrating from AHC4 was a significant change either way.

The new-as-of-11 JDK HTTP code (JHC) is good. My view is that it is has become an important part of the JDK and will receive maintenance and refresh because of use in modern distributed systems including authentication integration. That can't be said for URLConnection but at Java1, HTTP wasn't as significant as it is today.

One reason to choose JHC is less dependency complexity. One role for Jena is as a library. Any package that might be used by a system using Jena as a library runs into dependency complexity around on versions. Jena had already been having to control which version of AHC was used, excluding it as a dependency of some Jena dependencies due their dependencies not being updated.

So the decision was more like "is JHC good enough?".

JHC is lower level than AHC and has a simpler API surface, though I would not be surprised to learn that AHC inspired some of the design of JHC.

Abstractions can be built on top of JHC. HttpRequestModifier is part of that because, despite W3C standards, different remote endpoints of (commercial) triple stores have different requirements. Short of hard-coding the variations into Jena itself, being able to tweak the query string and the headers.

RDFConnection (and its companion RDFLink for Graph and Node) pull the main mechanisms together (QueryExecHTTP, GSP etc) although those mechanism do have more direct ways of controlling headers because it's per operation related.

So my take is that JHC is a good engine for modern HTTP with good prospects for long term stability and support.