Disable dynamic class loading and Java serialization by default

atomix / catalyst

Networking and buffer APIs and implementations for use in Copycat and Atomix

http://atomix.io/catalyst

Apache License 2.0

52 stars 23 forks source link

Disable dynamic class loading and Java serialization by default #14

Open kuujo opened 8 years ago

kuujo commented 8 years ago

Deserialization of data from untrusted users can represent a security flaw. While Copycat and Atomix clusters should never be exposed to untrusted users, there's nevertheless no good reason to have dynamic class loading and Java serialization enabled. Performance is slow for Java serialization, and dynamic class loading requires serializing class names. These features should be disabled by default in order to encourage more efficient and secure white listing of classes with space-compact serialization IDs.

kuujo commented 8 years ago

@bgloeckle I'd like to get a second opinion on this before I hack it. Thoughts? The thought is, absent serialization and loading of class names, the Catalyst serialized effectively whitelists classes by requiring registration of serializable types either through the ServiceLoader pattern or in API calls. A theoretical attacker could only construct an instance of one of the registered classes which are presumably safe. I'm fine with this since Java serialization and serializing class names is slow anyways. But for clusters that are not exposed to untrusted user data (probably most of them) lazy developers should be able to re-enable Java serialization.

bgloeckle commented 8 years ago

I think whitelisting the classes is the usual fix for this problem, so that should be good to go. Though I don't fully understand what you mean with "Serializing class names" - do you mean the #readByClass and #writeByClass methods in Serializer? I think they're not the problem, but the problem is the ObjectInputStream used in #readBySerializable. The data that is written and read by the "byClass" methods could be crafted by an attacker (to match a whitelisted class name), but the actual Serializable that is deserialized could be harmful and the attack would have succeeded even before ObjectInputStream#readObject returns. I think you should subclass ObjectInputStream here and overwrite at least the #resolveClass method. That should then be enough to filter the valid class names: Just throw a ClassNotFoundException if the class is not on the whitelist. You might also want to overwrite the #resolveProxyClass method and just throw ClassNotFoundException always - as Proxies should never be deserialized in copycat anyway AFAIK. If the ObjectInputSTream cannot "load" the correct class then, it cannot be able to instantiate it and copycat/catalyst should be safe. That approach is also suggested in the Apache Blog.

kuujo commented 8 years ago

Actually, you're right. I was thinking readByClass did not care if the class name was whitelisted, but that's not the case. A TypeSerializer still has to be registered for it to execute any code. This actually does not need to be the case, which is why I didn't think it was the case. Classes that implement CatalystSerializable could be deserialized without being whitelisted, so I'll add that in a separate PR and add the option to disable the whitelisting requirement for serialized class names in another.

Overriding ObjectInoutStream seems fine to still allow Serializable to be used with whitelisting. Thanks!

On Dec 1, 2015, at 1:06 AM, Bastian Glöckle notifications@github.com wrote:

I think whitelisting the classes is the usual fix for this problem, so that should be good to go. Though I don't fully understand what you mean with "Serializing class names" - do you mean the #readByClass and #writeByClass methods in Serializer? I think they're not the problem, but the problem is the ObjectInputStream used in #readBySerializable. The data that is written and read by the "byClass" methods could be crafted by an attacker (to match a whitelisted class name), but the actual Serializable that is deserialized could be harmful and the attack would have succeeded even before ObjectInputStream#readObject returns. I think you should subclass ObjectInputStream here and overwrite at least the #resolveClass method. That should then be enough to filter the valid class names: Just throw a ClassNotFoundException if the class is not on the whitelist. You might also want to overwrite the #resolveProxyClass method and just throw ClassNotFoundException always - as Proxies should never be deserialized in copycat anyway AFAIK. If the ObjectInputSTream cannot "load" the correct class then, it cannot be able to instantiate it and copycat/catalyst should be safe. That approach is also suggested in the Apache Blog.

— Reply to this email directly or view it on GitHub.

bgloeckle commented 8 years ago

Ok, I'm not 100% sure if I was clear enough about this, so just to be sure: The serialized stream of an AppendRequest for example would look like the following:

[207][term/leader/logIndex/logTerm/commitIndex][TYPE_SERIALIZABLE][Java-serialized Entry 1][TYPE_SERIALIZABLE][Java-serialized Entry 2][...]

(207 is the SerializeWith ID of AppendRequest). This will be totally fine and be accepted when Serializer#readObject is called. The problem in that case is not the AppendRequest which will be deserialized with CatalystSerializableSerializer, but the serialized entries. (In other cases different to AppendRequest, also the first/top-most serialized object might be exploitable). An attacker could craft the bytes (s)he sends and knows that will be put into readObject. The stream could then contain the [207][term...][TYPE_SERIALIZABLE] just like normal, but the java-serialized entries are not actual "Entry" objects, but are serialized objects similar to those shown in the blog article I linked in my first post. There does not need to be a TypeSerializer registered for the actual "entry" classes (that are written and read), because catalyst does the (in my point of view correct) fallback to write and read Serializable using an ObjectOutputStream/ObjectInputStream. But exactly those bytes could be crafted and therefore you should ensure that only "whitelisted" objects are deserialized in the ObjectInputStream.

I think it's fine that users can enable "Serializable for all" if they are sure that noone ever will get a stream of bytes from an unknown source to be passed to Serializer#readObject. In all other cases, users need to be able to whitelist their own classes and/or the package name prefixes of classes they would like to whitelist.

bgloeckle commented 8 years ago

Ah, wrong example. The AppendRequest will of course contain other CatalystSerializable objects as entries. But the idea of the example still works: Assume the Entry is a CommandEntry and that itself contains a Command that is only java-serializable.

The serialized stream would then look like:

[207][term/leader/logIndex/logTerm/commitIndex][220][TYPE_SERIALIZABLE][Java-serialized Command]

where 220 is the SerializeWih ID of CommandEntry. Sorry for the confusion.

kuujo commented 8 years ago

Yeah, I think I understand what the problem with Java serialization is. I'm talking about CatalystSerializable and TypeSerializer as an additional, separate but related issue. The problems with Java serialization are:

Java serialization allows arbitrary classes to be loaded, meaning an attacker can load any class by manipulating the byte stream
Java serialization can execute arbitrary code in those arbitrary classes

Correct?

What I'm saying, is the same will be true of CatalystSerializable if it does not require TypeSerializer to be registered and does allow class names to be arbitrarily loaded (which should be the case for usability reasons), correct? Currently, TypeSerializers have to be registered for CatalystSerializable objects. That needs to be changed to allow CatalystSerializable class names to be loaded (not for security but for usability reasons) in addition to whitelisting Serializable classes. So, an attacker could simply write any class name in the byte stream, and if that class implements CatalystSerializable, it will create an instance and call readObject.

Sure, this implies that a CatalystSerializable class with exploitable readObject code must be available to be loaded and deserialized, but the same is true of Serializable, and this is still effectively the same flaw and solution.

So, I'm talking about making two different changes:

Require whitelisting Serializable class names by default for the Java serialization flaw
Essentially enable the same potential flaw in CatalystSerializable and require whitelisting by default

On Dec 2, 2015, at 3:04 AM, Bastian Glöckle notifications@github.com wrote:

Ah, wrong example. The AppendRequest will of course contain other CatalystSerializable objects as entries. But the idea of the example still works: Assume the Entry is a CommandEntry and that itself contains a Command that is only java-serializable.

The serialized stream would then look like:

[207][term/leader/logIndex/logTerm/commitIndex][220][TYPE_SERIALIZABLE][Java-serialized Command]

where 220 is the SerializeWih ID of CommandEntry. Sorry for the confusion.

— Reply to this email directly or view it on GitHub.

bgloeckle commented 8 years ago

Yes, Serializable allows arbitrary classes to be loaded (if ObjectInputStream is not subclassed etc) and those arbitrary classes could execute code on instantiation that was received in the serialized stream (=remote code execution).

I think that CatalystSerializable or generally classes for which a TypeSerializer is available are somewhat different, though. AFAIK even when #readByClass or #writeByClass is called, the object will always be de-/serialized by either a TypeSerializer or by java serialization, so we have only those two cases when thinking about de-/serialization (= from a security point of view we do not care what readByClass actually read, as that might not match the serialized data anyway, it's just used for choosing the Serializer). The set of classes for which a TypeSerializer is available is pretty small, no class of catalyst/copycat (at least AFAIK currently ;) ) does expose a remote code execution (as long as the ObjectInputStream is subclassed correctly) and I think you generally can "trust" the TypeSerializers - there are no libraries which start to register TypeSerializers and which might be used by an application without knowing that these libraries register those. With Serializable, this is different, as all classes that implement Externalizable/Serializable whcih are on the classpath are automatically up for being used with java serialization, and a lot libraries out there use that - which might then lead to deserializing crafted serialized streams that then expose a remote code execution.

A user of catalyst/copycat might of course implement additional classes that implement CatalystSerializable or add new TypeSerializers and of course these may then again expose a remote code execution vulnerability - but if that engineer implements a class that way, he might also whitelist a Serializable class that would expose that vulnerability. This is then in his own responsibility. You can also think of it in a way, that, if there is a TypeSerializer, then either catalyst/copycat or the engineer using it really wants to de-/serialize the corresponding objects and would need to whitelist them anyway (if he'd have to whitelist them).

So, from my point of view, it should be enough to fix the Serializable (and Externalizable, too) vulnerability by subclassing ObjectInputStream as described above. All other code does not need any adjustment.

kuujo commented 8 years ago

You're right that a user can implement CatalystSerializable and then the security of that class is their responsibility. But that's assuming the user of Copycat has control over all classes that implement CatalystSerializable. Indeed, that may be the case now, but the potential security issues come into play with the possibility of some other random library that implements CatalystSerializable and does something exploitable in the readObject method. Catalyst is perfectly usable outside of the context of Copycat, and while it's exceedingly unlikely to be used anywhere else now, there's no great reason not to make the simple change to require whitelisting CatalystSerializable class names initially. That's already a requirement at this point anyways. I'll basically just enable the option to load CatalystSerializable classes by name and disable it by default.

Thanks for all the input. I'm working on a solution for this now and will follow your ObjectInputStream suggestion, which seems to work fine.