apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.05k stars 444 forks source link

Refactor FATE #2473

Open milleruntime opened 2 years ago

milleruntime commented 2 years ago

I was looking at the warnings we suppress in ZooStore and they all seem to stem from the serialization of the generics. I was wondering if it would be better to refactor out the generic types so we don't have to arbitrarily serialize objects and blindly cast them. This would be a bit of work but it might not be too bad. I think if we create an interface like FateEnvironment to pass to the operations we wouldn't have to pass the Manager around.

EdColeman commented 2 years ago

Maybe this should consider https://github.com/apache/accumulo/issues/1249

milleruntime commented 2 years ago

If this was for 3.0, we could probably drop the old Bulk Import. That would help clean up a lot of code.

milleruntime commented 2 years ago

One idea I had was to create an Environment interface to pass between FATE and the Manager. The tricky part is that some of the FATE code is in core (the interface, framework and some implementation) while most of it is in the same package as the Manger. I thought we could do this:

public interface RepoEnv {
  public ServerContext getContext();
}

But that won't work because ServerContext is in server and Fate is in core.

ctubbsii commented 2 years ago

On serialization, Java 17 has the new record keyword. It should be safer to serialize records than general Java objects, as we've done in the past. I have not yet used records, and we aren't using Java 17 yet... but could for 3.0 development.

keith-turner commented 2 years ago

It should be safer to serialize records than general Java objects

Looking around for what we could do now to make things safer, found the following. Maybe we could use the new ObjectInputFilter added in Java 9 to explicitly limit what classes FATE can deserialize.

https://www.oracle.com/java/technologies/javase/seccodeguide.html#8 https://docs.oracle.com/javase/9/docs/api/java/io/ObjectInputFilter.html

milleruntime commented 2 years ago

Another improvement that could be included in this change: Convert FateTxId to a stronger type to prevent coding errors like forgetting to call the method to display the value in HEX. The code would probably be 10x better if we had a FateTxId type that we used instead of long. Its toString method could consistently format for logging maybe. That could possible encapsulate or extend UUID, but I think we would benefit form a more specific type than UUID or long.

milleruntime commented 2 years ago

For serialization of FATE repos, we could use JSON. There is already this in the FateCommand: https://github.com/apache/accumulo/blob/0fbf9cda0d1e16d67bf714970cb22dfa2907e50f/shell/src/main/java/org/apache/accumulo/shell/commands/fate/FateCommand.java#L85-L90

milleruntime commented 2 years ago

@ctubbsii @keith-turner I have been experimenting with JSON serialization in FATE. There are already some classes to serialize the data for displaying with the Fate command. What do you guys think of trying to change the serialization to JSON in 2.1?

ctubbsii commented 2 years ago

@ctubbsii @keith-turner I have been experimenting with JSON serialization in FATE. There are already some classes to serialize the data for displaying with the Fate command. What do you guys think of trying to change the serialization to JSON in 2.1?

I think that would probably be too big and too disruptive for 2.1 as we're trying to wrap up major features for 2.1. But, I'm not opposed to the idea, generally. However, I still think it's worth looking into newer Java serialization for record objects in 3.0.

milleruntime commented 2 years ago

@ctubbsii @keith-turner I have been experimenting with JSON serialization in FATE. There are already some classes to serialize the data for displaying with the Fate command. What do you guys think of trying to change the serialization to JSON in 2.1?

I think that would probably be too big and too disruptive for 2.1

After some experimentation, that was my feeling as well. The generics combined with the complexity of the different interfaces really make it difficult to make any small changes.