dotnet / source-build

A repository to track efforts to produce a source tarball of the .NET Core SDK and all its components
MIT License
265 stars 132 forks source link

What is the expected usage of source-built assets? #187

Open eerhardt opened 7 years ago

eerhardt commented 7 years ago

We need to decide what we expect users are going to do with the assets built from source-build.

For example, do we expect people to publish the NuGet packages that were built to some feed? Do we expect consumers to restore these NuGet packages in their apps?

One usage I can see is for the RID-specific NuGet packages to get published. For example, if someone wants to build a self-contained application that runs on Tizen, they would need to reference the runtime.tizen.4.0.0-armel.Microsoft.NETCore.App package. But Microsoft doesn't build/publish that package. So I could imagine a case where someone uses source-build to build a RID-specific Microsoft.NETCore.App package, and publishes it so users can build self-contained applications on platforms that aren't officially built by Microsoft.

@dotnet/source-build-contrib

omajid commented 7 years ago

As someone who intends to use the output of source-build for packaging .NET Core into some Linux distributions, I have some comments.

Our primary use case is that we want to replicate what Microsoft distributes as the downloadable ".NET Core SDK" using source-build and make that available to users of those Linux distributions.

These Linux distributions often target architectures that are not targetted by Microsoft. In addition to x86_64 (aka x64) and i686 (aka x86), we are interested in architectures like arm32hf, aarc64, or even ppc64le. So we want to be able to build/run .NET Core applications on these architectures too. I understand that many of these ports do not exist, so I am really talking about a hypothetical future.

As for packages and remote publising, we would like the behaviour to match upstream. I don't think we will be uploading to NuGet, but the spirit of open source demands the packages are built and that we have the ability to, in case we wanted to to. In other words, if NuGet.org was to disappear from the face of earth for a day, someone in the community could build .NET Core from source and upload it to a TemporaryNuGet.org.

For another example, it would be great if we could upload custom RID packages to a custom myget repository so everyone can start using .NET Core on these not-supported-by-Microsoft architectures or platforms too.

tmds commented 7 years ago

It would be nice if nuget packages that would be uploaded to nuget.org would be put together in a folder. So in theory, we could reference that folder from a NuGet.Config file and offline (i.e. using only artifacts built by source-build)

Also, source-build should also the ASP.NET runtime store since that is also part of the Microsoft .NET Core distribution. We want to have the same sdk and runtime API surface.

omajid commented 7 years ago

Also, source-build should also the ASP.NET runtime store since that is also part of the Microsoft .NET Core distribution. We want to have the same sdk and runtime API surface.

This was implicit in my description, but thank you for making it explicit: .NET Core built by source-build should be identical (bit by bit, ideally), including the package cache, runtime store and anything else that is included in Microsoft's distribution of .NET Core. This is immensely important. Without this source-build will be considered a second class citizen and not the real .NET Core.

omajid commented 7 years ago

cc @gegenschall, who maintains Arch Linux packages for .NET Core.

weshaggard commented 7 years ago

It would be nice if nuget packages that would be uploaded to nuget.org would be put together in a folder. So in theory, we could reference that folder from a NuGet.Config file and offline (i.e. using only artifacts built by source-build)

Part of the trouble with doing that is we cannot do it fully offline. A lot of our library packages cross-compile for many different targets (i.e. older versions of netstandard, netcoreapp, other OS's) which we don't have all of the versions of the source to rebuild a full nuget package with all those targets without downloading other binaries (mostly older nuget packages) to be able to build successfully. So if the goal is to produce exactly what the .NET team produces then we will require being online and we won't be able to do it purely from the contents of a source tar-ball.

However if the goal is purely to build a new shared framework, CLI/SDK, ASP.NET store for a new RID then I think we can accomplish that offline.

So to add to the question how much of this do you guys expect to be able to build from source purely offline?

4creators commented 7 years ago

From my perspective as user I would like to be able to patch source of any repo, build complete patched .NET Core SDK and use it as a main tool on machine it was built on including Windows. Implicitly it requires an install step which is usually a part of any oss source package build infrastructure with default option being to install artifacts as default machine package. It would be helpful to have a support for offline build but from my perspective it can come in a more distant future.

As a minimum expectation my scenario would require .NET Core runtime and some basic tools with above build capabilities.

This scenario skips NuGet publishing entirely.

tmds commented 7 years ago

Part of the trouble with doing that is we cannot do it fully offline. A lot of our library packages cross-compile for many different targets (i.e. older versions of netstandard, netcoreapp, other OS's) which we don't have all of the versions of the source to rebuild a full nuget package with all those targets without downloading other binaries (mostly older nuget packages) to be able to build successfully.

Does that mean you are adding extra targets by patching existing nuget packages?

So if the goal is to produce exactly what the .NET team produces then we will require being online and we won't be able to do it purely from the contents of a source tar-ball.

Anything that is online can be cached off-line. The source tar bal-ball we have now is also packed with nuget packages.

We should build those things which are meaningful for the .NET Core version&RID. So, when building .NET Core 2.1 (as an example). I think the meaningful targets would be:

Would it be good/feasible to generate and archive a tar file for each release? That tar file should include the online dependencies cached + source-build. We need to include the source-built commit id used to generate the tar file (if that isn't the case already).

I am talking a lot about these nuget packages. The first goal is to build the distributable (sdk/runtime) itself from source. I find the nuget packages important because it brings full open-source during the build of an app/library too.

Please comment on how much sense this makes to you and how feasible it is.

RheaAyase commented 7 years ago

Please don't take anything below personally, it's just a bunch of facts as they are without any decoration.

We need to decide what we expect users are going to do with the assets built from source-build.

The Primary goal is for The Packagers to be able to package .NET Core CORRECTLY for every operating system (and platform.)

what the .NET team produces

...is not an Open Source product.

However if the goal is purely to build a new shared framework, CLI/SDK, ASP.NET store for a new RID then I think we can accomplish that offline.

Not for a new RID. For the "old" ones too. All our current packages are hacked together holding only by the power of will.

A question you may ask: Why package something we provide for download at https://www.microsoft.com/net/core ?

Because it should be a simple install dotnet command which pulls the correct dependencies, and later on it would take care of all the updates, everything. Another reason is that we are able to provide fixes in very fast pace, we are able to build for not yet released versions of Linux distributions. Microsoft doesn't do that. In the Linux world it's often about being on the edge, having the latest version, but Microsoft packages arrive half a year late. One whole major release late in case of Fedora.


@omajid and @tmds have made the technical points quite well, I'll pass on repeating those words.

eerhardt commented 7 years ago

Example of why we can't package netcore - it can't be built from source.

Is this still currently the case? I was under the assumption that the current source-build code met the needs of building netcore from source.

(sorry if the below question has already been discussed at length, I'm relatively new to this effort)

One piece that is still confusing me is the definition of the requirement of being built from source. Does this mean:

  1. Every binary that is contained in the package needs to be built from source code - or -
  2. Every binary that is contained in the package and all the tools used to build those binaries need to be built from source code.
aslicerh commented 7 years ago

I believe the definition we should be working with would be:

I'm not sure if that actually clarified much though.

RheaAyase commented 7 years ago

@eerhardt

Every binary that is contained in the package and all the tools used to build those binaries need to be built from source code.

Long story short, this would be the answer, @omajid could talk about it in more detail...

@aslicerh

I'm not sure if that actually clarified much though.

I think that it summed it up nicely =)

omajid commented 7 years ago

@weshaggard wrote:

Part of the trouble with doing that is we cannot do it fully offline. A lot of our library packages cross-compile for many different targets (i.e. older versions of netstandard, netcoreapp, other OS's) which we don't have all of the versions of the source to rebuild a full nuget package with all those targets without downloading other binaries (mostly older nuget packages) to be able to build successfully. So if the goal is to produce exactly what the .NET team produces then we will require being online and we won't be able to do it purely from the contents of a source tar-ball.

This is an issue that we should sort out, then.

Is there any particular reason we couldn't handle older versions by building those older versions separately too? For example, could we not build netstandard (from source, using the output of source-build) and then provide that to a next build so we could build everything in source-build that needs netstandard to build?

I think the key here is that packages that were themselves built from source (using only .NET Core form source-build) may be available on disk; they shouldn't be fetched from nuget.org/myget.org.

I dont have a good solution for cross compilation (targetting other architectures or platform).

However if the goal is purely to build a new shared framework, CLI/SDK, ASP.NET store for a new RID then I think we can accomplish that offline.

Please consider this as a minimum goal. It will work, but doesn't put .NET Core in a good lit as a open source project if not every component can be built from source and patched as needed.

So to add to the question how much of this do you guys expect to be able to build from source purely offline?

Ideally, 100% of what is distributed by Microsoft from http://dot.net/core.

@4creators wrote:

From my perspective as user I would like to be able to patch source of any repo, build complete patched .NET Core SDK and use it as a main tool on machine it was built on including Windows.

I would like to emphasize building involves various binaries used for building - such as the buildtools used at build time. "patch source any repo" should include patching these .NET Core-specific tools for building.

@tmds wrote:

Anything that is online can be cached off-line. The source tar bal-ball we have now is also packed with nuget packages.

That is true, but the tarball is not good enough for many Linux distributions. Those packages must be built from source too (recursively). It is a question of fixing bugs (can we fix issues in build tools? can we fix issues in libraries used at build time?) and security (if we find a CVE, can we fix it even if upstream doesnt want to?)

@eerhardt

Is this still currently the case? I was under the assumption that the current source-build code met the needs of building netcore from source.

Not really. Think of a packaging building process as a reproducible pipeline. For another language like Java, it looks like this:

OpenJDK sources + a prebuilt version of OpenJDK SDK + native compiler + native libraries -> new version of OpenJDK SDK

And the build systems used by Linux distribution track each component in the pipeline. There is a process for identifying every component used and to patch and rebuild it, locally, if upstream disappears, or the internet goes down worldwide. For example, if there is a bug in the native compiler, all we really need to do is fix that, and then rebuild OpenJDK.

For .NET Core and current source-build the process looks like:

.NET Core sources + prebuilt .NET Core SDK Version 1.1 + prebuilt .NET Core SDK Version 2.0 + nuget packages (that target various platforms) + nuget packages (that include various precompiled binaries) + nuget packages (for build tools) -> .NET Core 2.0 SDK

Unfortunately, there are a ton of components in this pipeline that are not currently built from source. We need to get to a state where, aside from one version of a .NET Core SDK that we can use for bootstrapping, every other component must be built from source and assembled in a way which makes sure what code was compiled using what tools to produce what binary.

One piece that is still confusing me is the definition of the requirement of being built from source. Does this mean:

Every binary that is contained in the package needs to be built from source code - or - Every binary that is contained in the package and all the tools used to build those binaries need to be built from source code.

Ideally, the second one. Our ideal goal is for .NET Core to get to a state where users can get the sources/patch/build any component and still build a .NET Core with that. And be sure only those versions were used in the .NET Core build.

If it helps, I dont think all of this work needs to be done in source-build. We should try and get all the dependency nuget packages to a state where they can be built using source-build and then used to build a subsequent version of source-build.

@aslicerh

Using only the publicly available information and build tools (source, readmes, compiler, etc, but not arbitrary binaries), anyone should be able to reproduce the entirety of the built project. This means building all necessary binaries as well, and those produced binaries being identical to the official ones (outside of things like automated compiler optimizations).

Pretty much. There is a one time exception for bootstrapping. So we could use a prebuilt .NET Core SDK (and a small set of nuget packages) to start off the build process, but eventually every component used to build a final .NET Core SDK must have been built from source.

tmds commented 7 years ago

Based on the discussion, I see 3 levels:

Level 1: Capable of building the dotnet distributable with same .NET Core features as Microsoft All binaries included in distributable are built from source

Level 2: Capable of building assets needed to build a netstandard library and netcoreapp It is possible to build offline using source-build built artifacts only

Level 3: Capable of using the source-build output to build all binaries consumed by source-build All binaries to build source-build are also built from source

tmds commented 7 years ago

That level 3 is the goal is stated on the readme:

Many Linux distributions have specific rules for official packages. The rules can be summarized as two main rules: source for everything, and consistent reproducability. A key goal of this repository was to satisfy the official packaging rules of commonly used Linux distributions, such as Fedora and Debian.

The split-up in 3 levels is to mark significant 'open source milestones'. Currently we are working towards level 1, for which we still have open issues like the 'runtime store' and the 'net46 omnisharp'.

@eerhardt is your question answered by the above replies?

eerhardt commented 7 years ago

@eerhardt is your question answered by the above replies?

I'm beginning to see the picture. The part that still confuses me is what the expectations for the NuGet packages would be. Do people expect to use the NuGet packages that were created by source-build in the same ways they use the NuGet packages that Microsoft produces?

tmds commented 7 years ago

Do people expect to use the NuGet packages that were created by source-build in the same ways they use the NuGet packages that Microsoft produces?

As a source-build user: (Correct me if I'm wrong) I think we need these packages (level 2) to be able to do level 3.

As a dotnet developer: I can develop a netcoreapp app/netstandard lib using these packages without access to NuGet.org. So "source for everything, consistent reproducible" applies also to the application I develop using .NET Core.

Petermarcu commented 7 years ago

I think we need to take a hard look at how we use NuGet and if there are alternatives. Speaking in the extreme, one question I ask is: "What would it mean to deliver the SDK such that NuGet packages aren't a part of it but all the experiences work?". If NuGet were a way to get "newer" bits or bits that are not part of the SDK only, would that be a better world for build from source.

I'm nervous about the idea of having multiple "correct" builds of the same version of the same package floating around.

eerhardt commented 7 years ago

It's not really the NuGet packages that are the problem. It is the stuff that is contained inside the NuGet packages. In order to build for netstandard1.x or net4x, you need to get those reference assemblies from somewhere.

And also, there are assets that are being source-built that aren't in the SDK. For example, System.Data.SqlClient, System.DirectoryServices, System.Text.Encoding.CodePages, etc. We would need to provide an alternative acquisition story for those packages if we didn't want to use NuGet.

Petermarcu commented 7 years ago

I think we've said we want to make sure you can build the refs for any version of .NET Standard easily from source. Once that is possible, if we just provided a flat folder of reference assemblies as a fallback in the SDK, would that work?

Is this just for how we build the product or also that they need to be there for people using the SDK and targeting those platforms?

weshaggard commented 7 years ago

I think we've said we want to make sure you can build the refs for any version of .NET Standard easily from source. Once that is possible, if we just provided a flat folder of reference assemblies as a fallback in the SDK, would that work?

While we could build just a flat folder here and customize all our repo's to use it in this mode it would require extra work to fork builds which isn't ideal. I think if we want to do this we should build the packages like they already exist which would require less changes in our repos and would allow folks consuming an SDK offline to use them as well.

eerhardt commented 7 years ago

for any version of .NET Standard

What about net4x? See https://github.com/dotnet/source-build/issues/125

Is this just for how we build the product or also that they need to be there for people using the SDK and targeting those platforms?

This is not for how we build the product. This is for people using the product that was "built from source".

Petermarcu commented 7 years ago

Yup, I see that delivery of reference assemblies in general is the problem. We've started to use NuGet heavily for that but I'm wondering if there needs to be a simpler fallback that ships with the SDK that is just a flat folder of assemblies for a given platform. Maybe I'm suggesting things that really are "boiling the ocean" but as far as I can tell NuGet and build from source seem to be heavily at odds.

RheaAyase commented 7 years ago

This is not for how we build the product. This is for people using the product that was "built from source".

But this is how we (any Linux distribution, including Red Hat) should build the product (being the sdk, runtime and host.)

(Just because we have packages in RHEL doesn't mean that they are correct by our standards, and we really need to push towards that.)

tmds commented 7 years ago

If we have (and can build) reference assemblies for all target frameworks, would we then be capable of building all .NET Core nuget packages? Or are we still missing something? This means we'd have a build everywhere capability for all .NET platforms. Without an understanding of the technical challenges/effort... I find this is the nicest solution.

If not feasible, we need to look at solutions for a restricted set of tfms (netcoreapp, netstandard, ?net46, ??).

omajid commented 7 years ago

This is not for how we build the product. This is for people using the product that was "built from source".

Supposing there was a built from source version that worked as the community wants, would there be any reason for Microsoft to not use that too? Are there any issues with this approach which would mean that Microsoft can not adapt it?

tmds commented 7 years ago

This is not for how we build the product. This is for people using the product that was "built from source".

I think this is for how we build the product (level 2 supporting level 3) and for people using it (level 2). I don't know if @eerhardt meant something else.

Supposing there was a built from source version that worked as the community wants, would there be any reason for Microsoft to not use that too? Are there any issues with this approach which would mean that Microsoft can not adapt it?

Currently, Microsoft's .NET Core build has/relies on artifacts which can't be built from source (on Linux). When we get to the point where everything can be built from source, I can't think up good reasons why Microsoft would want do it different.

eerhardt commented 7 years ago

This is not for how we build the product. This is for people using the product that was "built from source".

I think this is for how we build the product (level 2 supporting level 3) and for people using it (level 2). I don't know if @eerhardt meant something else.

Yes, it is for both. But the word This in my sentences was referring to my concern about how users were going to use the product. How can they build netstandard1.x or net4x libraries? I'm more concerned about the end-user experience than how we are building our product. We can implement any hacks/workarounds we need to in order to build the product. But these hacks/workarounds won't be acceptable to end-users.

Today we do build the higher-level libraries in the product the same way an end-user would. So yes, solving this correctly for the end-user will solve it for these higher-level libraries. But solving it through hacks just for how we build the product isn't going to solve it for the end-user.

Yes, I meant both. But I'm more concerned about the people using it.

weshaggard commented 7 years ago

If we have (and can build) reference assemblies for all target frameworks, would we then be capable of building all .NET Core nuget packages? Or are we still missing something?

That would get us a lot of the way there but not all the way as we still have to worry about back-compat with our nuget packages and as such we end up reshipping existing binaries from older nuget packages for some TFMs. For these we don't currently have live builds for them in the tip of our repo's so we would either need to bring back all live builds (and the #ifdef maintenance of those) or pull in more source from our older release branches to be able to build those. This is a cascading problem which might require us to do hoop multiple times.

I have tend to agree with @Petermarcu that our use of nuget packages to deliver the platform seems to be at odds with our build from source efforts here. Something will have to give, either we stop using nuget the way we are which requires an large architectural change for .NET Core or we compromise on being able to build exactly the same assets in build from source and our MS official builds.

tmds commented 7 years ago

Something will have to give, either we stop using nuget the way we are which requires an large architectural change for .NET Core or we compromise on being able to build exactly the same assets in build from source and our MS official builds.

I'm thinking about this a bit. I haven't come up with many different solutions. I guess one way of handling the build-everything-from-source requirement could be to split off the source-buildable parts of the NuGet packages. e.g. System.Data.SqlClient has a related oss.System.Data.SqlClient which contains the netstandard and netcoreapp parts. oss.System.Data.SqlClient is then the package that meets the rules. System.Data.SqlClient is the package maintained by Microsoft that doesn't need to meet the rules.

aslicerh commented 7 years ago

I guess one way of handling the build-everything-from-source requirement could be to split off the source-buildable parts of the NuGet packages. e.g. System.Data.SqlClient has a related oss.System.Data.SqlClient which contains the netstandard and netcoreapp parts. oss.System.Data.SqlClient is then the package that meets the rules. System.Data.SqlClient is the package maintained by Microsoft that doesn't need to meet the rules.

This could be handled similar to the way that some package managers handle repos: There could be a 'free' and 'non-free' set of NuGet repos. Maybe even a flag in NuGet that, if (not) set, would warn you if you are sure you want to use non-free packages. The marking of free/non-free could be at a package level or a repo level.

ghost commented 7 years ago

On the offline bit, is it so that .NET Core source builds still depends on some NuGet packages that Microsoft has no plans to open source? It's a shame that we cannot have 100% pure, transparent .NET Core source code after all the efforts by .NET and ASP.NET teams..

So to add to the question how much of this do you guys expect to be able to build from source purely offline?

100%, everything, no hanky-panky. Then we can create nice reusable layered docker and make our lives easier by ourselves. If there is a third party dependency, like ASP.NET-KestrelHttpServer depends on libuv, it is and should be open sourced. Then source-build will run the autoconf build script in libuv and place it in correct output dir. If there are some closed-source dependency, please list them down transparently and give a brief explanation on why they can't be open sourced or when they will be open sourced.

This, imo, will give the community great confidence in open-source mode of Microsoft without any doubts.

dseefeld commented 6 years ago

/cc @leecow Please create issues for actionable items from this discussion