dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.89k stars 782 forks source link

[Question] Determinism of the F# Compiler #1042

Closed 0x53A closed 7 years ago

0x53A commented 8 years ago

The Roslyn team has made an effort to make the C# / VB compilers deterministic: https://github.com/dotnet/roslyn/issues/372

This is interesting both for security (validating that an exe was built by the source someone claims it was built from) and for the build performance itself (only rebuilding things that changed).

According to this ominous tweet, it looks like Microsoft itself is building a deterministic & distributed build system: https://twitter.com/xjoeduffyx/status/686785616030240768 (Does anyone know anything about that? It is already April =) )

Are there any plans to make the F# compiler deterministic?

KevinRansom commented 8 years ago

It is not currently in our plans to make the compiler deterministic although it would be sort of nice, to reap the benefits of determinism, essentially all or most of the tools in the build system need to be deterministic. Deterministic builds for products the size of VS or windows are clearly advantageous and performance improving. What do you think?

smoothdeveloper commented 8 years ago

@KevinRansom (or anyone with even a remote idea) could you share what are the aspects that are known to be undeterministic?

I gather there might be async usages whose unordered results or side effects might create non-deterministic behaviour; it would help to have a rough overview if you expect the community to take on some smaller parts of this work.

Thanks

KevinRansom commented 8 years ago

Well …

· there are paths buried in the dll for source files, to tell the debugger about source files to debug

· there is an mvid which is a random guid generated when a binary is created.

· Version numbers .. id’s

· I’m not too worried about async … the F# compiler isn’t anywhere near as reliant on thread-pools as Roslyn. In fact it is pretty much single threaded.

The number of items in roslyn was surprisingly long … I remember at standup one or another of the dev’s on the team would be working on some determinism bug or another that they had found, most weeks. The toughest problem, was some lack of determinism somewhere else in the toolchain, because that required another team to fix their determinism issues.

A particularly cool one occurred if there was a GC while writing out methods for some particular item, the GC could discard the items in a weak reference collection, and somehow the order of the subsequent items emitted changed.

· If the community could construct determinism test cases,

· we would certainly try to fix determinism issues, and take PR’s to address those issues.

· The community would also need to fix up the community type providers which I suspect are pretty non-deterministic too.

Just as an FYI: the main driver for determinism in C# is that Windows and Visual Studio both have vast codebases (many millions of lines of code in each) and they are moving over to a deterministic build in order to ensure that we can gain the large build time reductions due to cashing of build artifacts.

There is a large portion of C# code in both Windows and VS, due to the limited amount of F# code in those two codebases is lower there has been no pressure for us to make the compiler deterministic.

What is it about determinism builds that you see as being particularly useful?

Kevin

0x53A commented 8 years ago

Thank you for that detailed update. I guess that while determinism would be nice, there are currently more pressing issues like core-clr support.

What is it about determinism builds that you see as being particularly useful?

We don't have such a big codebase, so it is not as important for us as for e.g. Microsoft, but I would be interested in deterministic builds to:

smoothdeveloper commented 8 years ago

Thanks for all those details, quite interesting!

What is it about determinism builds that you see as being particularly useful?

TBH, as end user of F#, that doesn't impact me much, I use Resharper's Build which does very clever things with public API footprint of assemblies which saves lot of time in rebuild, even if I change a project low in the dependency tree it won't rebuild dependent things most of the times.

At Microsoft (and place having distributed build setup), it seems the requirement for deterministic (or at least distributed) build is higher, probably many solutions have been adopted for that; there still (naïvely) seems to be viable solutions around the fact that compiler might not be deterministic, by checking the exact input to the compiler against the cache.

For end-users, with large F# code base, compilation time probably has a more direct impact.

dsyme commented 8 years ago

Strictly speaking there's relatively little non-determinism in the F# compiler. You can see that partly because we are able to use emitted IL as a baseline. While we have to "clean" that IL, that's normally because different arguments have been passed to the compiler, or compilation has happened in a different directory

AFIAK the only items that are really non-deterministic come from the use of a timestamp:

If you want determinism to mean "it doesn't matter which directory you compile in", then this item is also relevant:

That said, the F# compiler is not particularly "stable under small code changes", which can be another meaning of non-determinism in incremental scenarios. Specifically it generates stamps, see newStamp and its uses, and these can end up in emitt4ed IL, as can line numbers. So adding a blank line to your source can change your output. That may or may not be a problem depending on your scenarios.

It would be great to rid ourselves of these last few sources of non-determinism, by just doing whatever C# does in these cases.

davidglassborow commented 8 years ago

I'm keen to look at this issue, although bare with me, this is my first attempted contribution to F# 😎

Much like @0x53A, we have a complex project build from mostly C#, with increasing amounts of F# at the lower levels. We put all artifacts for release into Git, and having deterministic builds will greatly help our speed and efficency of CI and CD.

Changing the two MVID and PDB timestamp as @dsyme has identified will give us what we need.

I've been investigating what C# did for a Roslyn in these two areas, a good reference for the changes made for Roslyn by @jaredpar: http://blog.paranoidcoding.com/2016/04/05/deterministic-builds-in-roslyn.html

To quote:

To create the MVID and time stamp with repeatable unique values the compiler uses cryptographic hashes. It takes the content of the PE with the above entries set to 0 and runs it through a SHA1 3 hash. The resulting 20 bytes are then carved up into a GUID (16 bytes) and a time stamp entry (4 bytes, high bit always set). A similar operation is performed for the PDB ID. This means the above values will be repeatable and unique for a given set of inputs. The combination of the explicit ordering guarantee and the predictable values for MVID, PDB ID and timestamp allow us to produce fully deterministic PE outputs from the compiler. They will be identical byte for byte.

I'll have a look at the Roslyn repo to find the exact code that C# uses.

First question, should this we a compiler option or a change of default behaviour ? If an option, default to on or off ? jaredpar mentions the default of C# being off because of AssemblyVersionAttribute = *, is that something we need to support in F# ?

jaredpar commented 8 years ago

First question, should this we a compiler option or a change of default behaviour ? If an option, default to on or off ?

The best way to drive that decision is to consider the AssemblyVersionAttribute case and how the conflicts are handled. Essentially when the compiler is both trying to build deterministically and dealing with an AssemblyVersionAttribute that has a * value. These are naturally in conflict because:

One says change, the other never change. :frowning:

There are really a limited number of choices here for the compiler to resolve this:

  1. Honor deterministic and Ignore the * (use a default value like 0)
  2. Ignore deterministic and honor the *
  3. Honor both, issue no warning / error, and field the "determinism is broken" bug reports
  4. Issue an error because this is confusing

Once you settle on which behavior the F# compiler should have above then that will pretty much settle the decision about what the default behavior should be.

davidglassborow commented 8 years ago

Thanks, sounds like 4 is the sane choice, like C#, but maybe a warning rather than an error (are you still hopeful of going deterministic by default in C# 7 ?)

jaredpar commented 8 years ago

It's nearly impossible for us to have deterministic by default + issuing a warning about the use of *. It would break so many customers that it would be an adoption blocker for Visual Studio. The only realistic way we could make it the default is to:

  1. Not warn about the use of *
  2. Use warning waves to make the warning explicitly opt in.
KevinRansom commented 8 years ago

It depends how you feel about determinism in the compiler. In my mind the compiler should build the application described the source files. and the use of a /determinism switch should cause the compiler to complain 'error' when it can't be deterministic and honour the source code. However, no error is necessary in the absence of the switch.

my 2cents and probably worth all that we paid for them.

davidglassborow commented 8 years ago

all sounds sensible, so opt in to determinism and error if using * and /deterministic

davidglassborow commented 7 years ago

Have had initial stab at this on PR #2954, will work on tests next.

dsyme commented 7 years ago

Closing due to merge of #2954 - thanks for the feature @davidglassborow !