dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.94k stars 789 forks source link

Make FSharp.Core safe for SQL CLR with IL initonly (somehow) on static field 'empty' in Map #3535

Open SomethingAwry opened 7 years ago

SomethingAwry commented 7 years ago

I want to use F# to create CLR functions for SQL Server but I cannot publish them automatically because FSharp.Core is not "safe" as far as SQL Server is concerned.

Specifically, I get the error message:

CREATE ASSEMBLY failed because type 'Microsoft.FSharp.Collections.FSharpMap`2' in safe assembly 'fsCLR' has a static field 'empty'. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.

Of course, it may be that some other resolution to 'empty' is needed other than the one suggested above, but it's the first message I've seen stating why FSharp.Core isn't safe, but I don't have the chops to address it without the worry of mucking up something else.

Details

A Visual Studio (2017) solution with two projects: 1) an F# library, 2) a SQL Server Database Project.

A simple library could be:

module fsCLR
open System.Data.SqlTypes
[<Microsoft.SqlServer.Server.SqlFunction>]
let FsFunc (ss: SqlString) : SqlString =
    if ss.IsNull then ss else "from F#: " + ss.Value |> SqlString

The database project can be empty, just add a reference the library. Building/Publishing the DB project will do the work of creating the deploy SQL for an assembly binary and making the function(s) available in T-SQL.

Try to publish the DB project to a SQL Server; fails with the above message.

Contrariwise, the same DB project with a C# library auto-published works fine.

Note

Strictly speaking, the 'direct translation' of a working C# library would look like this:

namespace UserDefinedFunctions
open System.Data.SqlTypes
type fsCLR () =
    [<Microsoft.SqlServer.Server.SqlFunction>]
    static member FsFunc (ss: SqlString) : SqlString =
        if ss.IsNull then ss else "from F#: " + ss.Value |> SqlString

but doesn't make a difference.

dsyme commented 7 years ago

SQL Server long, long ago (~2003) decided to implement a (I believe crazy) set of additional checks on .NET IL above and beyond what is done by peverify.exe. I presume the people who decided to do this aren't even at Microsoft any more, and the decision has been lost in history. I've never seen these additional checks documented anywhere, nor any explanation of what soundness result is being established by the checks, and I don't know how to put it under automated test. This makes it really, really hard for other languages and code-generating tools to make sure that code can be run in SQL Server.

We did try pretty hard to generate SQLServer-friendly constructs in 2005-2007 back when ".NET code running in SQL Server" was the big new thing. It is still a useful scenario, and if we had PRs with some user validation that we were following the rules - and some documentation of the rules with automated checking in the F# test suite - then we might be able to accept it, if it doesn't change codegen too much.

zpodlovics commented 7 years ago

They have an SQLCLR that is - as far as I know - implemented on top of SQLOS, in fact the plan is to support SQLCLR on Linux (please see the comments): https://blogs.technet.microsoft.com/dataplatforminsider/2016/12/16/sql-server-on-linux-how-introduction/

We are currently planning to support existing SQLCLR functionality for SAFE assemblies only (which works in the current CTP). This has no relation to .NET Core.

Pleae note, the newer version will have even more restriction in place, eg.: the 2017 version will only allow SAFE assemblies.

My guess is that they implemented something like Joe-E which is a subset of the Java programming language intended to support programming according to object-capability discipline for SQLCLR:

https://en.wikipedia.org/wiki/Joe-E https://people.eecs.berkeley.edu/~daw/papers/joe-e-ndss10.pdf

    _Classes may not have mutable static fields_, because these create global state.
    Catching out-of-memory exceptions is prohibited, because doing so allows non-deterministic execution. For the same reason, finally clauses are not allowed.
    _Methods in the standard library may be blocked_ if they are deemed unsafe according to taming rules. For example, the constructor new File(filename) is blocked because it allows unrestricted access to the filesystem.

There are lot's of restriction what's available in SQLCLR:

https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration/database-objects/clr-integration-programming-model-restrictions https://docs.microsoft.com/en-us/sql/relational-databases/clr-integration-security-host-protection-attributes/host-protection-attributes-and-clr-integration-programming

http://www.sqlservercentral.com/articles/Stairway+Series/104406/ http://www.sqlservercentral.com/articles/Stairway+Series/109905/

dsyme commented 7 years ago

@zpodlovics Thanks for the links and the context!

I still find it kind of extraordinary that there are these adhoc extra limitations for this one particular execution environment, and that there isn't a set of tools to help library writers and compiler implementers stick within those limitations. I mean, it's not like SQL is the only environment that wants to host CLR code, nor that these rules somehow create a sound sandbox environment. Anyway...

The main problem is that FSharp.Core uses some of these constructs, e.g. non-readonly static fields to store amortized values such as the empty list, or the empty map and so on. It might be that we can adjust the compilation of FSHarp.Core.

Anyway, if you or someone else can get the compilation of FSharp.Core to the state where its compiled form passes these tests, and we can keep it that way through automated testing, then I think we would accept PRs in this area.

srutzky commented 6 years ago

Hi there, @dsyme (and others). Just to clarify a few things here:

  1. I don't think that these restrictions are "ad hoc" or arbitrary. The goal is to ensure that stability of the SQL Server process as much as possible. To that end:

    • there are certain types of functionality that are entirely useless for background processes running without an interactive context (i.e. Environment.UserInteractive == false). So no need for Dialogs, Forms, graphics, fonts, etc.
    • also, the App Domain is shared across all Sessions (i.e. SPIDs) in SQL Server. App Domains are per Database, per Owner (i.e. the AUTHORIZATION database principal). Hence, ALL callers of a particular method share the single App Domain that the Assembly (and Class) have been loaded into. This makes it far too easy to have race conditions. For example: SQLCLR assembly throws error when multiple queries run simultaneously
  2. FSharp.Core uses some of these constructs, e.g. non-readonly static fields to store amortized values such as the empty list, or the empty map and so on.

    I am only barely familiar with F# so I apologize if this is a silly question, but if these objects are empty, do they need to be writable? Is it that something about them (structure, etc) changes? I believe readonly static variables can be set in initialization and via the class constructor. Is this something that seems viable, or am I way off here?

  3. If these non-readonly static fields a) need to be writable, and b) are static for performance reasons, is it possible to forgo the performance gain and make them non-static fields, or something along those lines? Meaning, is it possible to accept that performance, at least for some scenarios, in this particular CLR host is slightly worse than other CLR hosts? If so, can compiler directives and/or other mechanisms be used to create a separate build for SQL Server / SQLCLR?

  4. Even if either the static fields can be marked as readonly or a separate build can be achieved with a modified approach to these fields, it is still possible to not meet the goal of having a SAFE Assembly IF FSharp.Core references .NET Framework libraries that are not on the "supported" list.

  5. Contrary to what @zpodlovics noted 2 comments above, SQL Server 2017 is not restricted to SAFE Assemblies only, at least not across all platforms. SQL Server on Windows still fully supports EXTERNAL_ACCESS and UNSAFE Assemblies. It is only SQL Server 2017 (and newer, presumably) on Linux, as well as SQL Server via Amazon RDS, that supports SQLCLR only for SAFE Assemblies. (At this time, Azure SQL Database still does not support SQLCLR in any capacity).

    The security change with SQL Server 2017, even on Windows, is that now the requirements for loading an UNSAFE Assembly are enforced for both SAFE and EXTERNAL_ACCESS Assemblies. Meaning, you need to strong-name, or sign with a Certificate, each Assembly, and first load that strong name key (Asymmetric Key in SQL Server) or Certificate into master, create a Login from that Asymmetric Key / Certificate, and grant that Login the UNSAFE ASSEMBLY permission. This is required even on Linux where EXTERNAL_ACCESS and UNSAFE Assemblies aren't even supported. For more details of this mess, please see: SQLCLR vs. SQL Server 2017, Part 1: “CLR strict security” – The Problem

  6. As far as automated testing goes, I'm not sure if it's possible to use Reflection to walk through all referenced .NET Framework classes and methods to see if a Host Protection Attribute (HPA) has been defined, but that might be something to try. Of course, that won't catch usage of non-readonly static variables, but should catch stuff that PEVERIFY.EXE wouldn't. Of course, you could also install a copy of SQL Server Express LocalDB on the build server that can spin up automatically when an attempt is made to connect to it. The CI process could simply attempt to execute the CREATE ASSEMBLY statement and report back any errors if encountered. LocalDB will shutdown a few minutes after the connection is closed.

Hope this info helps :-). Take care, Solomon..

dsyme commented 6 years ago

@srutzky Thanks for the info, it's the first explanation I've seen of the SQL checks since 2003.

The restrictions feel adhoc since they were invented by SQL server in 2004-2005 without, AFAIK, documenting it until now. Good to have the docs.

IF FSharp.Core references .NET Framework libraries that are not on the "supported" list.

FSharp.Core doesn't reference anything that's not on that list. It basically references nothing but System.Core, System and mscorlib.

There is one true global mutable I think, accessed via Async.DefaultCancellationToken.

AFAIK FSharp.Core is about the safest component around, it's just a functional programming library. Perhaps SQL Server could add it to its safe list. It's Microsoft-signed so the strong name should be enough.

I believe readonly static variables can be set in initialization and via the class constructor. Is this something that seems viable, or am I way off here?

The way F# initializes values is slightly different to C#. Initializers for F# are run for an entire file, not class-by-class, so all static data declared in a file is initialized at once. A static constructor is used under the hood but it may initialize fields from multiple classes.

This is entirely sound and valid according to ECMA 335, but means we can't mark fields as "readonly" and still be verifiable. The fields are, however, private. From the perspective of F# the added checks seem pointless, since the library is safe.

FWIW as a compiler writer and language developer it's extremely hard to predict from ECMA 335 that some tool will impose the restriction that "static data must all be marked readonly, even private data". It's basically impossible to shape a language/library design and implementation if tools are free to impose extra adhoc rules like this...

srutzky commented 5 years ago

@dsyme Sorry for the delay in responding, just been so incredibly busy... I still think it would be awesome to get FSharp.Core working in SQL Server / SQLCLR as a SAFE assembly (if possible), so I will do what I can to assist in this effort..

@srutzky Thanks for the info, it's the first explanation I've seen of the SQL checks since 2003.

Yer quite welcome 😺

The restrictions feel adhoc since they were invented by SQL server in 2004-2005 without, AFAIK, documenting it until now. Good to have the docs.

I think the documentation was actually there, at least from the release of SQL Server 2005, but it was likely not well advertised and not easy to find unless you knew to look for it there.

FSharp.Core doesn't reference anything that's not on that list. It basically references nothing but System.Core, System and mscorlib.

This definitely helps.

There is one true global mutable I think, accessed via Async.DefaultCancellationToken.

I'm not sure what, if any, impact this has. I think we will find out if / when we get passed the static variable issue.

AFAIK FSharp.Core is about the safest component around, it's just a functional programming library. Perhaps SQL Server could add it to its safe list. It's Microsoft-signed so the strong name should be enough.

It's probably not as simple as that to get a library added to the "supported" list. The libraries on that list are guaranteed:

  1. to be tested across new versions of, and updates to, SQL Server.
  2. to provide backwards compatibility across new versions of, and updates to, the CLR and the .NET Framework. SQL Server's CLR host is linked to a single version of the CLR. SQL Server 2005, 2008, and 2008 R2 only work with CLR v2 (hence .NET Framework versions 2.0, 3.0, and 3.5), while SQL Server 2012 and newer (through 2019 currently) only work with CLR v4 (hence .NET Framework versions 4.x). Code compiled against CLR v2 but running in SQL Server 2012 or newer functions the same as it does when running in SQL Server 2005 - 2008 R2, yet is running in CLR v4 since there is no way to target a CLR v2-bound Framework version.
  3. to never become a mixed-mode assembly, because SQL Server's CLR host only allows pure-MSIL assemblies.

Initializers for F# are run for an entire file, not class-by-class, so all static data declared in a file is initialized at once. A static constructor is used under the hood but it may initialize fields from multiple classes.

I could be misunderstanding something here, but this does not sound like a conflict. Currently SQL Server appears to be initializing static class variables when each class is first referenced (I just tested this in SQL Server 2017 CU14). However, I don't see the practical harm in FSharp.Code initializing static class variables for all classes at the same time.

This is entirely sound and valid according to ECMA 335, but means we can't mark fields as "readonly" and still be verifiable. The fields are, however, private. From the perspective of F# the added checks seem pointless, since the library is safe.

I don't understand the "we can't mark fields as readonly and still be verifiable" part, but don't know that I need to either 😉 . However, the fields being private is irrelevant to the "readonly" requirement. This requirement exists because the App Domain is shared across all Sessions (i.e. SPIDs) in SQL Server. App Domains are per Database, per Owner (i.e. the AUTHORIZATION database principal). Hence, ALL callers of a particular method share the single App Domain that the Assembly (and Class) have been loaded into. This makes it far too easy to have race conditions. For example: SQLCLR assembly throws error when multiple queries run simultaneously.

That being said, if you simply cannot mark the static class variables as "readonly", then we need to find another way. Do the values of those static class variables ever change after being initialized? If not, then there is a solution that would probably work (though some might consider it sketchy): decorate the static class variables with the CompilerGeneratedAttribute (the "Remarks" section even mentions SQL Server).

dsyme commented 5 years ago

That being said, if you simply cannot mark the static class variables as "readonly", then we need to find another way. Do the values of those static class variables ever change after being initialized?

They are never changed after initialization, with the exception of Async.DefaultCancellationToken

If not, then there is a solution that would probably work (though some might consider it sketchy): decorate the static class variables with the CompilerGeneratedAttribute (the "Remarks" section even mentions SQL Server).

So it does! Yes, I think that may work

srutzky commented 5 years ago

Do the values of those static class variables ever change after being initialized?

They are never changed after initialization, with the exception of Async.DefaultCancellationToken

decorate the static class variables with the CompilerGeneratedAttribute (the "Remarks" section even mentions SQL Server).

So it does! Yes, I think that may work

Awesome! So we are at least a little closer 😄

Two questions:

  1. You mentioned that Async.DefaultCancellationToken does (or at least can) change after initialization. Does this create the potential for a race condition given the shared nature of the App Domain, hence shared nature of that static variable? Or, is the value stored in it valid for any session reading it?

  2. Looking ahead at deployment considerations (trying to make it as painless as possible), you mentioned that the library is Microsoft-signed. Does this include signing with a certificate? And if yes, is it the same certificate used to sign the .NET Framework 4.x libraries? (there will be follow-up questions if certificates are not being used).

dsyme commented 5 years ago

You mentioned that Async.DefaultCancellationToken does (or at least can) change after initialization. Does this create the potential for a race condition given the shared nature of the App Domain, hence shared nature of that static variable? Or, is the value stored in it valid for any session reading it?

Yes it does - see here, it is not protected via lock. We could protect it

Looking ahead at deployment considerations (trying to make it as painless as possible), you mentioned that the library is Microsoft-signed. Does this include signing with a certificate? And if yes, is it the same certificate used to sign the .NET Framework 4.x libraries? (there will be follow-up questions if certificates are not being used).

@KevinRansom @brettfo Can you confirm?

thanks!

brettfo commented 5 years ago

I don't know how the dotnet/arcade SDK handles signing; I only know that it uses the Microsoft key.

@tmat might know about signing/certs, though.

tmat commented 5 years ago

Yes, our assemblies are signed with SHA2 AuthentiCode certificate.

srutzky commented 5 years ago

@tmat Thanks for that info! It actually reminded me that I can verify this myself: since FSharp.Core is available via NuGet, I just need to add it to a project and check FSharp.Core.dll's properties. So that's what I did. It appears to be signed with the same Certificate(s) used for the .NET Framework v4.x libraries. And that's very good news as it makes the deployment script much, much easier.

In fact, I have already tested this by loading the MS certificate (the SHA1 digest cert with thumbprint = 9DC17888B5CFAD98B3CB35C1994E96227F061675 ), and then I was able to create the FSharp.Core assembly in SQL Server 2019, with CLR strict security enabled (ideal), and in a database with TRUSTWORTHY OFF (ideal). Of course, the assembly was marked as UNSAFE, but the result of what we are doing here will be that it loads as SAFE.

Assuming that:

  1. the CompilerGenerated attribute can be added to static variables that are not marked readonly,
  2. the concurrency issue with Async.DefaultCancellationToken is addressed, and
  3. this static class variable issue is the only issue and not merely the first issue with other issues waiting to be found after this one is fixed

then I can take care of the deployment aspect of this, and will assist on the automated testing aspect. At this point I am not sure how to proceed, but here is a general overview of what will be involved (as far as I can tell, of course):

BUILD

Ultimately we need a completely self-contained deployment script that can be added to a project as a pre-release step, or handed to a DBA to execute once wherever needed, etc. Providing only the DLL and instructions on how to deploy it will only cause stress (trying to reinvent this wheel), lead to bad security practices (trying to get this done the quickest way possible), or giving up on F# for SQLCLR.

  1. We need a script / mechanism to generate the deployment SQL file
  2. This only applies to building for the .NET Framework, not .NET Standard or .NET Core, etc.
  3. The script can be a post-build event (on successful build), or similar triggering if not using MSBuild
  4. There will be a base / template SQL file that only needs the contents of FSharp.Core.dll inserted into it
  5. I will need some guidance to determine your preferred mechanism for handling this. I have used T-4 templates to do this as well as simple CMD scripts. To assist in this process, I wrote a command-line tool, BinaryFormatter, that not only converts the binary DLL into its hex string representation (i.e. 0x0D4A....), but allows for setting a max line length, at which point it adds a \ for T-SQL line-continuation and proceeds on the next line. Setting a max line length makes the deployment script more readable and more manageable, at least in SSMS where it doesn't like super-long lines. We don't need to use BinaryFormatter if we are using something more programmatic like T-4 templates, but for a CMD script we will. I haven't tried PowerShell but I suppose that's an option as well.
  6. I will also need someone to added the standard copyright notice / disclaimer / etc to the SQL file.
  7. The SQL file will contain some instructions at the top regarding its intended use (when, and how, to use it).
  8. The final SQL file will:
    1. enable "CLR enabled" if not already enabled
    2. create the MS Certificate (public key only) in [master] if not already there (using FROM BINARY = 0x....)
    3. create the certificate-based login if not already there
    4. grant the certificate-based login the UNSAFE ASSEMBLY permission
    5. create the FSharp.Core assembly if not already there (using FROM 0x....), else load updated assembly using ALTER ASSEMBLY. This should prevent needing to removing dependent objects (other assemblies that reference this one, and T-SQL wrapper objects pointing to those assemblies)

I have used this pattern before, so this script is already partially complete.

TEST

I'm not sure if you want to include the final deployment script in the automated testing. If you do, here is what's needed:

  1. Include MSI (or at least a link to it) for SQL Server Express LocalDB. Currently, version 2017 is the most recent version that is available, but that might not be the best choice as it is more limited in terms of the OSs that it can be installed on. The oldest version of LocalDB that is available is SQL Server 2012. Picking 2014 or 2016 might be best.
  2. Running the LocalDB MSI is the only manual step needed, and it's only needed once.
  3. CMD script will use SQLCMD.EXE to:
    1. ensure that the test DB has been created
    2. might need to ensure environmental config, such as: 'CLR strict security' enabled (if SQL Server 2017 or newer), and TRUSTWORTHY OFF for the test DB. Without ensuring these two settings, the security aspect of the test (the part requiring the certificate and its login) will not truly be tested.
    3. execute the SQL file produced by the build process against the test DB

At this point, as long as the FSharp.Core assembly has been created, then success! At least to a degree. The goal is to get FSharp.Core loaded as a SAFE assembly as it will help increase adoption given that folks tend to shy away from UNSAFE assemblies, and platforms such as Azure SQL Database Managed Instances only allow SAFE assemblies. But, loading it as SAFE doesn't mean that it can execute as SAFE since the verification done at load-time (whether from disk / binary or upon loading into the App Domain) can't catch everything. So, to go one step further in testing, we could include a simple SQLCLR assembly with one or two simple functions and/or stored procedures, and have an additional test step for loading that assembly, creating the T-SQL wrapper objects, and executing them (all via SQLCMD.EXE).


Sorry for the lengthy post, but this stuff needed to be documented somewhere to get a sense of scope. If there is a more appropriate place to put the above info, just let me know.

At this point, the only major hurdle (that I can see) is resolving the concurrency issue related to Async.DefaultCancellationToken. Of course, something to consider is: if this static variable is only used when someone explicitly codes async functionality (as opposed to it being used internally even if the user assembly doesn't attempt anything async), then we have the option of simply stating that Phase 1 of getting this to work "seamlessly" in SQLCLR does not support async functionality. And then we solve that issue later. In this scenario, we only need the static variables decorated with CompilerGenerated. Thoughts?

allokera commented 1 year ago

@dsyme @vzarytovskii @srutzky

Is there something specific me and the boys, e.g., @august-alm, can help out with to get this mess sorted out? Since the issue has been moving in and out of F# Compiler and Tooling there might be more to the story than I’m made aware of from this thread, any pointers?

My company works on a boutique compiler for the CLR and we also have the main parts of our core banking system at current being dependent on both SQLCLR and F#, so I we have the incentives sort this out smoothly.