JarrettBillingsley / Croc

Croc is an extensible extension language in the vein of Lua, which also wishes it were a standalone language. Also it's fun.
http://www.croc-lang.org
79 stars 12 forks source link

Define different "levels" of serialization #128

Open JarrettBillingsley opened 10 years ago

JarrettBillingsley commented 10 years ago

The serialization lib is really cool but also ridiculously unsafe. You can very easily create invalid thread data which, upon deserialization, would cause the host to crash, or worse. You could (possibly?) spoof native EH frames and cause the host to execute arbitrary code.

This is bad.

This would place the serialization lib firmly in the realm of "unsafe" but it's so useful that it would be a shame not to be able to use it generally. So how about we have different levels of serialization which limits the types which can be written/read?

There could be "simple", which would only allow you to serialize JSON-like object graphs, although allowing circularly-referenced objects. null, bool, int, float, string, table, namespace, array, and memblock would be perfectly safe.

Then there would be "intermediate" which would allow all the simple types, plus weakref, class, and instance.

Finally there would be "full" which would be all those plus function, funcdef, and thread (and upval).

I suppose serialization is safe so any type can be serialized (though it'd be good to have these levels so you can be sure it can be de-serialized by a more limited mode).

Could this be done by having the first two levels built-in, and then the full methods only added to the serialization/deserialization lib if the host asks for them?

JarrettBillingsley commented 10 years ago

The C++ port of the serialization lib doesn't serialize threads, not because of anything wrong with C++, but because I changed the way EH frames are stored (they're per-VM instead of per-thread now). I'm not sure whether or not it's worth serializing threads anyway; you can never serialize any threads with native functions on their call stack, and there's a ton of room for tampering/invalid data.

Still it'd be good to limit serialization of those other types, since there's room for unsafety there.