apache / accumulo-proxy

Apache Accumulo Proxy
https://accumulo.apache.org
Apache License 2.0
9 stars 19 forks source link

Update to accumulo 2.1.1 #81

Open magrable opened 1 year ago

ctubbsii commented 1 year ago

Thanks for the updates, @magrable , but the ITs seem to be consistently failing with this. I'm not sure what the state is for the tests without this, though. Ideally, the changes to the proxy repo should be done to work with any of the 2.1.x versions. I'm not sure if the code changes here will work with both 2.1.0 and 2.1.1. We might need to bring in more of this boilerplate Thrift setup code into the proxy, so it doesn't depend on non-public API in Accumulo to start the Thrift service, and will be more independent from the changes in Accumulo itself.

drewfarris commented 1 year ago

I'm looking into the failing IT tests, it turns out that I can reliably replicate the failures from the command-line. Running individual tests fail with TTransport Socket is closed by peer errors.

drewfarris commented 1 year ago

Accumulo 2.1.x only supports the thrift TCompactProtocol and no longer supports other thrift protocols. The failing integration tests attempt to validate the operation of thrift protocols that are no longer supported.

So to trace this in depth, you can see that looking at the Accumulo 2.1 branch, ThriftUtil is hardcoded to create a thrift server uses the TraceProtocolFactory.

TraceProtocolFactory is a subclass of TCompactProtocol.Factory and ultimately uses the TCompactProtocol under the hood.

This means that clients will only be able to communicate with the Proxy using the TCompactProtocol. As of Accumulo 2.1.x TServerUtils.startTServer no longer accepts a protocolFactory parameter, so the code in o.a.accumulo.proxy.Proxy has no way of asking that a different protocolFactory be used. Commit d3a3277 cleaned up TServerUtils to obtain the protocol factory from ThriftUtil.

However, the integration tests for the the Accumulo Proxy test a variety of protocols, from TCompactProtocol to TJsonPrococol and TBinaryProtocol. I observed that the TCompactProtocol test completes just fine, whereas others fail because the server side of the AccumuloProxy always uses TCompactProtcol regardless of the test configuration.

So, we could deleting the IT's for the other protocols, ProxyDurabilityTest also will need to be updated to use TCompactProtocol.

[edited: cleaned up formatting]

ctubbsii commented 1 year ago

Thanks for tracking down the failure, @drewfarris . To add some more context to this, Accumulo didn't use any other protocols for itself, for its own (non-public) RPC, so those extra methods to support using other protocols inside Accumulo's utility code were not being used, and were simplified, because they weren't needed internally, and these were not public APIs.

It does appear that the proxy itself allows the protocol to be specified in its own configuration. That's reasonable, since its RPC is its API, and consumers of the proxy may have need of using one protocol over another for their application. Since these tests verify that feature of the proxy work, I think the fact they are failing legitimately reveals the fact that this PR is not good enough to merge on its own. I would argue for keeping those ITs.

Instead of removing the ITs that test a desirable proxy feature, and conforming the proxy code to what is still accessible inside Accumulo's internal utility code, the proxy should avoid using Accumulo's internal utility code entirely, and start Thrift services using Thrift server boilerplate that exists inside the proxy itself. That will keep a strong separation between Accumulo's internals and the proxy's.

drewfarris commented 1 year ago

Thanks for the feedback @ctubbsii. I understand your point about not reusing the internal Accumulo Thrift utility code for the proxy server, I will work with @magrable to address this. I suspect that we'll be duplicating a not-insignificant amount of code here, but it will further the goal of divorcing the proxy codebase from non-public Accumulo code.

Would you explain why having multiple thrift protocols for communication between a proxy-client and the proxy is a useful or desirable feature? Is there any relevant background for how Accumulo itself chooses thrift protocols to use that may serve as a precedent for making the decision to choose to support (or not support) multiple thrift protocols in the proxy?

ctubbsii commented 1 year ago

Thanks for the feedback @ctubbsii. I understand your point about not reusing the internal Accumulo Thrift utility code for the proxy server, I will work with @magrable to address this. I suspect that we'll be duplicating a not-insignificant amount of code here, but it will further the goal of divorcing the proxy codebase from non-public Accumulo code.

I don't think it needs to duplicate all of Accumulo's boilerplate. The proxy might not need all the things that the Accumulo code is doing. If it turns out there's a substantial amount of it, we could consider trimming down features, or other considerations.

Would you explain why having multiple thrift protocols for communication between a proxy-client and the proxy is a useful or desirable feature? Is there any relevant background for how Accumulo itself chooses thrift protocols to use that may serve as a precedent for making the decision to choose to support (or not support) multiple thrift protocols in the proxy?

The proxy is just supporting the variety of different use cases that Thrift itself supports. Whatever Thrift's reasoning is for keeping them all would presumably extend to us. So, my argument is more about continuing support for an existing feature, unless there's good reason to strip it out, rather than a positive argument justifying the the need to support all those protocols. I can imagine, though, that some protocols work better with some languages and that not all languages support all protocols. While most of the people I know who use the proxy use it with Python, the proxy itself isn't Python specific, and there might be good reasons some users have for using particular languages with particular protocols.

If a good argument can be made that the Accumulo proxy should only support a subset of those Thrift features, then I'm open to that, but I would default to keeping support for what we have now.