Open oefe opened 3 years ago
The same example also fails in Jupyter, but for a different reason: #796
Thanks @oefe for reporting this. We will move away from BinaryFormatter
from upcoming releases.
cc @suhsteve @jonsequitur (FYI)
Not sure what the dotnet plan for replacing BinaryFormatter here is, but I was wanting to use net spark in F# notebooks (where UDFs have never been supported), plus I had a couple of other similar problems that all looked solvable if dotnet had something like pythons cloudpickle. So I wrote one, https://github.com/Ibasa/Pikala. Forked spark to use it and can now run F# UDFs from notebooks.
We will move away from BinaryFormatter from upcoming releases.
@imback82 Is there a solution to this, now is 2023?
@GeorgeS2019 @imback82
Interestingly if you target .net standard we don't see the compiler errors: netstandard2.0;netstandard2.1 https://learn.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-2-1#select-net-standard-version
I think there would still be runtime errors in a project built for .net 8.
The bigger issues is that binaryformatter is bad practice from a security standpoint (and moreover it won't allow AOT compilation of the worker in .net 8!)
UPDATE 4/16:
As I mentioned, you can get past compiler errors simply by building a project in a way that targets .net standard. This will still be compatible with .net 8. See a discussion about compatibility here: https://github.com/dotnet/runtime/discussions/90829
And you can also get past any runtime errors on .Net 8 simply by using these settings in the project file:
<NoWarn>$(NoWarn);SYSLIB0011</NoWarn>
<EnableUnsafeBinaryFormatterSerialization>true</EnableUnsafeBinaryFormatterSerialization>
... they can be introduced in .net 8 project files - of our drivers, and of the Microsoft.Spark.Worker as well.
I understand that the binary deserialization is a security threat ... but it seems a bit overblown. It is mitigated when the spark cluster is hosted entirely on a private network and the deserialization is only happening between drivers and executors. In any case, these deserialization-related concerns must be weighed against the needs of spark. There are few other environments that do as much serialization/deserialization as spark does!
I see that the "kryo" serialization also has had well-known vulnerabilities that never stopped it from being used in the Spark ecosystem:
Links regarding kryo: https://stackoverflow.com/questions/40261987/when-to-use-kryo-serialization-in-spark https://www.contrastsecurity.com/security-influencers/serialization-must-die-act-1-kryo-serialization
... I think the ultimate answer is to respect the "vulnerabilities" of binary serialization, and to disable that type of serialization by default. In practice users should be allowed to re-enable the functionality for the sake of convenience and performance. It is a measured risk, that should be left to the application developers.
@dbeavon
From ChatGPT4 =>
The BinaryFormatter in .NET has indeed been marked as obsolete due to security concerns. Here are the key points:
Obsolete Methods:
BinaryFormatter.Serialize
BinaryFormatter.Deserialize
Formatter.Serialize (Stream, Object)
Formatter.Deserialize (Stream)
IFormatter.Serialize (Stream, Object)
IFormatter.Deserialize (Stream)
Recommended Action:
Project Types Affected:
ChatGPT seems pretty wrong on some points. The thing I wanted to point out is that BinaryFormatter seems to be allowed in .Net 8... if nothing else that is for the sake of compatibility with .NetStandard library dependencies. The approach to getting this working is very similar to .Net 7 (use EnableUnsafeBinaryFormatterSerialization, and disable SYSLIB0011 warnings).
The fact that we are dealing with a JVM-based (linux-based) platform means we can give ourselves more flexibility than a typical .Net solution. I would agree that we need to avoid BinaryFormatter by default, but it should probably allowed based on the discretion of users. In any case, I believe that the other serialization libraries which Spark/Jvm uses would also have the security concerns that apply to BinaryFormatter.
For .NET interactive withing polyglot notebooks, you can use
AppContext.SetSwitch("System.Runtime.Serialization.EnableUnsafeBinaryFormatterSerialization", true);
BinarySerializer is not that dangerous in Dotnet.Spark context, as UDFs and broadcasts are actually designed to be executable code that can be serialized on driver, and executed on worker. Both python and scala UDFs are doing exactly the same, but in other languages.
Describe the bug UDFs are serialized using BinaryFormatter, which is deprecated for security reasons.
In environments where
BinaryFormatter
is already disabled, this causes UDFs to fail with aNotSupportedException
.To Reproduce
Steps to reproduce the behavior:
Expected behavior
BinaryFormatter
Screenshots NA
Desktop (please complete the following information):
Additional context NA