dotnet / infer

Infer.NET is a framework for running Bayesian inference in graphical models
https://dotnet.github.io/infer/
MIT License
1.56k stars 230 forks source link

Remove BinaryFormatter for compliance reasons #443

Closed jonathantims closed 1 year ago

jonathantims commented 1 year ago

Remove BinaryFormatter for compliance reasons. Increment version because this is a significant change.

@m3nikma7i do you want me to change any of the other version numbers?

I've migrated everything that could be migrated, and added an IFormatter so that people can select their own serialization. For the Learners app and the IndexedSet and tests that require a formatter I add a JsonFormatter that works well with serializing mixed content to and from a stream.

tminka commented 1 year ago

If the goal is simply to replace BinaryFormatter with Json, why are there so many breaking changes to the public interface? I wouldn't expect any breaking changes at all.

tminka commented 1 year ago

CustomSerializationVersion needs to change on all classes whose serialization format has changed.

jonathantims commented 1 year ago

If the goal is simply to replace BinaryFormatter with Json, why are there so many breaking changes to the public interface? I wouldn't expect any breaking changes at all.

To be clear, the goal is to remove BinaryFormatter. Where there are core libraries that do serialization, I think allowing users to provide their own formatter (which they could potentially choose to be BinaryFormatter) is better than removing these APIs. I guess that's what you're referring to when you mention breaking changes to the public interface?

I think it's important to allow people to provide their own IFormatter because whereas for this repro I think choosing Newtonsoft.Json to implement the formatter makes sense, it is definitely not a general purpose replacement -- nothing is.

BinaryFormatter is very unique in a bunch of ways. For example, usually it does not call object constructors and just starts out with zeroed out initialized copies of objects which it then populates. For this reason, code that has been written against BinaryFormatter often has restrictive constructors that rely on not-being-called during serialization to work. I'm unaware of any other deserializer that does this, and I don't think we want to maintain a custom general purpose serializer that does this; instead much of the code uses "ForwardCompatible" serialization and that is I think what the primary serialization should be going forward and provide IFormatter arguments for backward compatibility.

jonathantims commented 1 year ago

CustomSerializationVersion needs to change on all classes whose serialization format has changed.

We haven't actually changed the serialization. If you provide the same formatter as before (the BinaryFormatter) you'll have the same serialization; and on the other hand if you use the ForwardCompatible/IReader methods, then these will also be unchanged in terms of what they actually serialize.

tminka commented 1 year ago

If we allow people to provide BinaryFormatter then we haven't actually addressed the security issue. We've just made it so that the tool that checks for the security issue doesn't fire on our code (but would still fire on their code). So I don't think that's the right solution.

jonathantims commented 1 year ago

If we allow people to provide BinaryFormatter then we haven't actually addressed the security issue. We've just made it so that the tool that checks for the security issue doesn't fire on our code (but would still fire on their code). So I don't think that's the right solution.

The security issue with BinaryFormatter is that if you haven't secured your data, then data can be altered to run arbitrary code when loading the model. This change gives people the option to use a different formatter if this is an issue for them, which wasn't previously an option. BinaryFormatter is only as inherently bad as, say, ".exe" files (by which I mean that if someone can mess with your exe files, they can also run arbitrary code when the exe is activated). The blanket ban on BinaryFormatter is a compliance mechanism.

tminka commented 1 year ago

Okay then, this hasn't really fixed the "compliance" issue (to use your terminology). It just gives the appearance of compliance.

jonathantims commented 1 year ago

Okay then, this hasn't really fixed the "compliance" issue (to use your terminology). It just gives the appearance of compliance.

I don't understand, sorry. The compliance ask is to remove BinaryFormatter? And the change removes BinaryFormatter? I'm not sure what change you're requesting/suggesting.

tminka commented 1 year ago

Okay then, this hasn't really fixed the "compliance" issue (to use your terminology). It just gives the appearance of compliance.

I don't understand, sorry. The compliance ask is to remove BinaryFormatter? And the change removes BinaryFormatter? I'm not sure what change you're requesting/suggesting.

The compliance ask is to not use BinaryFormatter. If we are facilitating (in some ways encouraging) the use of BinaryFormatter, then we're not following the ask.